meerschaum

Meerschaum Python API
Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.
Root Module
For your convenience, the following classes and functions may be imported from the root meerschaum namespace:
Classes
Examples
Build a Connector
Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:
import meerschaum as mrsm
sql_conn = mrsm.get_connector(
'sql:temp',
flavor='sqlite',
database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
# foo
# 0 1
sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
# foo
# 0 1
Create a Custom Connector Class
Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:
from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time
@mrsm.make_connector
class FooConnector(mrsm.Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']
def fetch(
self,
begin: datetime | None = None,
end: datetime | None = None,
):
now = begin or round_time(datetime.now(timezone.utc))
return [
{'ts': now, 'id': 1, 'vl': randint(1, 100)},
{'ts': now, 'id': 2, 'vl': randint(1, 100)},
{'ts': now, 'id': 3, 'vl': randint(1, 100)},
]
foo_conn = mrsm.get_connector(
'foo:bar',
username='foo',
password='bar',
)
docs = foo_conn.fetch()
Build a Pipe
Build a meerschaum.Pipe in-memory:
from datetime import datetime
import meerschaum as mrsm
pipe = mrsm.Pipe(
foo_conn, 'demo',
instance=sql_conn,
columns={'datetime': 'ts', 'id': 'id'},
tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
# ts id vl
# 0 2024-01-01 1 97
# 1 2024-01-01 2 18
# 2 2024-01-01 3 96
Add temporary=True to skip registering the pipe in the pipes table.
Get Registered Pipes
The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:
import meerschaum as mrsm
pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]
Add as_list=True to flatten the hierarchy:
import meerschaum as mrsm
pipes = mrsm.get_pipes(
tags=['production'],
instance=sql_conn,
as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]
Import Plugins
You can import a plugin's module through meerschaum.Plugin.module:
import meerschaum as mrsm
plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
noaa = plugin.module
If your plugin has submodules, use meerschaum.plugins.from_plugin_import:
from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')
Import multiple plugins with meerschaum.plugins.import_plugins:
from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')
Create a Job
Create a meerschaum.Job with name and sysargs:
import meerschaum as mrsm
job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()
Pass executor_keys as the connectors keys of an API instance to create a remote job:
import meerschaum as mrsm
job = mrsm.Job(
'foo',
'sync pipes -s daily',
executor_keys='api:main',
)
Import from a Virtual Environment
Use the meerschaum.Venv context manager to activate a virtual environment:
import meerschaum as mrsm
with mrsm.Venv('noaa'):
import requests
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
To import packages which may not be installed, use meerschaum.attempt_import():
import meerschaum as mrsm
requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
Run Actions
Run sysargs with meerschaum.entry():
import meerschaum as mrsm
success, msg = mrsm.entry('show pipes + show version : x2')
Use meerschaum.actions.get_action() to access an action function directly:
from meerschaum.actions import get_action
show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])
Get a dictionary of available subactions with meerschaum.actions.get_subactions():
from meerschaum.actions import get_subactions
subactions = get_subactions('show')
success, msg = subactions['pipes']()
Create a Plugin
Run bootstrap plugin to create a new plugin:
mrsm bootstrap plugin example
This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.
Open your plugin with edit plugin:
mrsm edit plugin example
Run edit plugin and paste the example code below to try out the features.
See the writing plugins guide for more in-depth documentation.
Create a Custom Action
Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:
from meerschaum.actions import make_action
@make_action
def sing():
print('What would you like me to sing?')
return True, "Success"
def sing_tune():
return False, "I don't know that song!"
def sing_song():
print('Hello, World!')
return True, "Success"
Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:
from meerschaum.plugins import make_action, add_plugin_argument
add_plugin_argument(
'--song', type=str, help='What song to sing.',
)
@make_action
def sing_melody(action=None, song=None):
to_sing = action[0] if action else song
if not to_sing:
return False, "Please tell me what to sing!"
return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala
mrsm sing melody --song do-re-mi
Add a Page to the Web Dashboard
Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:
from meerschaum.plugins import dash_plugin, web_page
@dash_plugin
def init_dash(dash_app):
import dash.html as html
import dash_bootstrap_components as dbc
from dash import Input, Output, no_update
### Routes to '/dash/my-page'
@web_page('/my-page', login_required=False)
def my_page():
return dbc.Container([
html.H1("Hello, World!"),
dbc.Button("Click me", id='my-button'),
html.Div(id="my-output-div"),
])
@dash_app.callback(
Output('my-output-div', 'children'),
Input('my-button', 'n_clicks'),
)
def my_button_click(n_clicks):
if not n_clicks:
return no_update
return html.P(f'You clicked {n_clicks} times!')
Submodules
meerschaum.actions
Access functions for actions and subactions.
meerschaum.actions.actionsmeerschaum.actions.get_action()meerschaum.actions.get_completer()meerschaum.actions.get_main_action_name()meerschaum.actions.get_subactions()
meerschaum.config
Read and write the Meerschaum configuration registry.
meerschaum.config.get_config()meerschaum.config.get_plugin_config()meerschaum.config.write_config()meerschaum.config.write_plugin_config()
meerschaum.connectors
Build connectors to interact with databases and fetch data.
meerschaum.connectors.get_connector()meerschaum.connectors.make_connector()meerschaum.connectors.is_connected()meerschaum.connectors.poll.retry_connect()meerschaum.connectors.Connectormeerschaum.connectors.sql.SQLConnectormeerschaum.connectors.api.APIConnectormeerschaum.connectors.valkey.ValkeyConnector
meerschaum.jobs
Start background jobs.
meerschaum.jobs.Jobmeerschaum.jobs.Executormeerschaum.jobs.systemd.SystemdExecutormeerschaum.jobs.get_jobs()meerschaum.jobs.get_filtered_jobs()meerschaum.jobs.get_running_jobs()meerschaum.jobs.get_stopped_jobs()meerschaum.jobs.get_paused_jobs()meerschaum.jobs.get_restart_jobs()meerschaum.jobs.make_executor()meerschaum.jobs.check_restart_jobs()meerschaum.jobs.start_check_jobs_thread()meerschaum.jobs.stop_check_jobs_thread()
meerschaum.plugins
Access plugin modules and other API utilties.
meerschaum.plugins.Pluginmeerschaum.plugins.api_plugin()meerschaum.plugins.dash_plugin()meerschaum.plugins.import_plugins()meerschaum.plugins.reload_plugins()meerschaum.plugins.get_plugins()meerschaum.plugins.get_data_plugins()meerschaum.plugins.add_plugin_argument()meerschaum.plugins.pre_sync_hook()meerschaum.plugins.post_sync_hook()
meerschaum.utils
Utility functions are available in several submodules:
meerschaum.utils.daemon.daemon_entry()meerschaum.utils.daemon.daemon_action()meerschaum.utils.daemon.get_daemons()meerschaum.utils.daemon.get_daemon_ids()meerschaum.utils.daemon.get_running_daemons()meerschaum.utils.daemon.get_paused_daemons()meerschaum.utils.daemon.get_stopped_daemons()meerschaum.utils.daemon.get_filtered_daemons()meerschaum.utils.daemon.run_daemon()meerschaum.utils.daemon.Daemonmeerschaum.utils.daemon.FileDescriptorInterceptormeerschaum.utils.daemon.RotatingFile
meerschaum.utils.daemon
Manage background jobs.
meerschaum.utils.dataframe.add_missing_cols_to_df()meerschaum.utils.dataframe.chunksize_to_npartitions()meerschaum.utils.dataframe.df_from_literal()meerschaum.utils.dataframe.df_is_chunk_generator()meerschaum.utils.dataframe.enforce_dtypes()meerschaum.utils.dataframe.filter_unseen_df()meerschaum.utils.dataframe.get_bool_cols()meerschaum.utils.dataframe.get_bytes_cols()meerschaum.utils.dataframe.get_datetime_bound_from_df()meerschaum.utils.dataframe.get_datetime_cols()meerschaum.utils.dataframe.get_datetime_cols_types()meerschaum.utils.dataframe.get_first_valid_dask_partition()meerschaum.utils.dataframe.get_geometry_cols()meerschaum.utils.dataframe.get_geometry_cols_types()meerschaum.utils.dataframe.get_json_cols()meerschaum.utils.dataframe.get_numeric_cols()meerschaum.utils.dataframe.get_special_cols()meerschaum.utils.dataframe.get_unhashable_cols()meerschaum.utils.dataframe.get_unique_index_values()meerschaum.utils.dataframe.get_uuid_cols()meerschaum.utils.dataframe.parse_df_datetimes()meerschaum.utils.dataframe.query_df()meerschaum.utils.dataframe.to_json()
meerschaum.utils.dataframe
Manipulate dataframes.
meerschaum.utils.dtypes.are_dtypes_equal()meerschaum.utils.dtypes.attempt_cast_to_bytes()meerschaum.utils.dtypes.attempt_cast_to_geometry()meerschaum.utils.dtypes.attempt_cast_to_numeric()meerschaum.utils.dtypes.attempt_cast_to_uuid()meerschaum.utils.dtypes.coerce_timezone()meerschaum.utils.dtypes.deserialize_base64()meerschaum.utils.dtypes.deserialize_bytes_string()meerschaum.utils.dtypes.deserialize_geometry()meerschaum.utils.dtypes.encode_bytes_for_bytea()meerschaum.utils.dtypes.geometry_is_wkt()meerschaum.utils.dtypes.get_current_timestamp()meerschaum.utils.dtypes.get_geometry_type_srid()meerschaum.utils.dtypes.is_dtype_numeric()meerschaum.utils.dtypes.is_dtype_special()meerschaum.utils.dtypes.json_serialize_value()meerschaum.utils.dtypes.none_if_null()meerschaum.utils.dtypes.project_geometry()meerschaum.utils.dtypes.quantize_decimal()meerschaum.utils.dtypes.serialize_bytes()meerschaum.utils.dtypes.serialize_datetime()meerschaum.utils.dtypes.serialize_date()meerschaum.utils.dtypes.serialize_decimal()meerschaum.utils.dtypes.serialize_geometry()meerschaum.utils.dtypes.to_datetime()meerschaum.utils.dtypes.to_pandas_dtype()meerschaum.utils.dtypes.value_is_null()meerschaum.utils.dtypes.get_current_timestamp()meerschaum.utils.dtypes.get_next_precision_unit()meerschaum.utils.dtypes.round_time()
meerschaum.utils.dtypes
Work with data types.
meerschaum.utils.formatting.colored()meerschaum.utils.formatting.extract_stats_from_message()meerschaum.utils.formatting.fill_ansi()meerschaum.utils.formatting.get_console()meerschaum.utils.formatting.highlight_pipes()meerschaum.utils.formatting.make_header()meerschaum.utils.formatting.pipe_repr()meerschaum.utils.formatting.pprint()meerschaum.utils.formatting.pprint_pipes()meerschaum.utils.formatting.print_options()meerschaum.utils.formatting.print_pipes_results()meerschaum.utils.formatting.print_tuple()meerschaum.utils.formatting.translate_rich_to_termcolor()
meerschaum.utils.formatting
Format output text.
meerschaum.utils.misc.items_str()meerschaum.utils.misc.is_int()meerschaum.utils.misc.interval_str()meerschaum.utils.misc.filter_keywords()meerschaum.utils.misc.generate_password()meerschaum.utils.misc.string_to_dict()meerschaum.utils.misc.iterate_chunks()meerschaum.utils.misc.timed_input()meerschaum.utils.misc.replace_pipes_in_dict()meerschaum.utils.misc.is_valid_email()meerschaum.utils.misc.string_width()meerschaum.utils.misc.replace_password()meerschaum.utils.misc.parse_config_substitution()meerschaum.utils.misc.edit_file()meerschaum.utils.misc.get_in_ex_params()meerschaum.utils.misc.separate_negation_values()meerschaum.utils.misc.flatten_list()meerschaum.utils.misc.make_symlink()meerschaum.utils.misc.is_symlink()meerschaum.utils.misc.wget()meerschaum.utils.misc.add_method_to_class()meerschaum.utils.misc.is_pipe_registered()meerschaum.utils.misc.get_cols_lines()meerschaum.utils.misc.sorted_dict()meerschaum.utils.misc.flatten_pipes_dict()meerschaum.utils.misc.dict_from_od()meerschaum.utils.misc.remove_ansi()meerschaum.utils.misc.get_connector_labels()meerschaum.utils.misc.json_serialize_datetime()meerschaum.utils.misc.async_wrap()meerschaum.utils.misc.is_docker_available()meerschaum.utils.misc.is_android()meerschaum.utils.misc.is_bcp_available()meerschaum.utils.misc.truncate_string_sections()meerschaum.utils.misc.safely_extract_tar()
meerschaum.utils.misc
Miscellaneous utility functions.
meerschaum.utils.packages.attempt_import()meerschaum.utils.packages.get_module_path()meerschaum.utils.packages.manually_import_module()meerschaum.utils.packages.get_install_no_version()meerschaum.utils.packages.determine_version()meerschaum.utils.packages.need_update()meerschaum.utils.packages.get_pip()meerschaum.utils.packages.pip_install()meerschaum.utils.packages.pip_uninstall()meerschaum.utils.packages.completely_uninstall_package()meerschaum.utils.packages.run_python_package()meerschaum.utils.packages.lazy_import()meerschaum.utils.packages.pandas_name()meerschaum.utils.packages.import_pandas()meerschaum.utils.packages.import_rich()meerschaum.utils.packages.import_dcc()meerschaum.utils.packages.import_html()meerschaum.utils.packages.get_modules_from_package()meerschaum.utils.packages.import_children()meerschaum.utils.packages.reload_package()meerschaum.utils.packages.reload_meerschaum()meerschaum.utils.packages.is_installed()meerschaum.utils.packages.venv_contains_package()meerschaum.utils.packages.package_venv()meerschaum.utils.packages.ensure_readline()meerschaum.utils.packages.get_prerelease_dependencies()
meerschaum.utils.packages
Manage Python packages.
meerschaum.utils.sql.build_where()meerschaum.utils.sql.clean()meerschaum.utils.sql.dateadd_str()meerschaum.utils.sql.test_connection()meerschaum.utils.sql.get_distinct_col_count()meerschaum.utils.sql.sql_item_name()meerschaum.utils.sql.pg_capital()meerschaum.utils.sql.oracle_capital()meerschaum.utils.sql.truncate_item_name()meerschaum.utils.sql.table_exists()meerschaum.utils.sql.get_table_cols_types()meerschaum.utils.sql.get_update_queries()meerschaum.utils.sql.get_null_replacement()meerschaum.utils.sql.get_db_version()meerschaum.utils.sql.get_rename_table_queries()meerschaum.utils.sql.get_create_table_queries()meerschaum.utils.sql.wrap_query_with_cte()meerschaum.utils.sql.format_cte_subquery()meerschaum.utils.sql.session_execute()meerschaum.utils.sql.get_reset_autoincrement_queries()
meerschaum.utils.sql
Build SQL queries.
meerschaum.utils.venv.Venvmeerschaum.utils.venv.activate_venv()meerschaum.utils.venv.deactivate_venv()meerschaum.utils.venv.get_module_venv()meerschaum.utils.venv.get_venvs()meerschaum.utils.venv.init_venv()meerschaum.utils.venv.inside_venv()meerschaum.utils.venv.is_venv_active()meerschaum.utils.venv.venv_exec()meerschaum.utils.venv.venv_executable()meerschaum.utils.venv.venv_exists()meerschaum.utils.venv.venv_target_path()meerschaum.utils.venv.verify_venv()
meerschaum.utils.venv
Manage virtual environments.
meerschaum.utils.warnings
Print warnings, errors, info, and debug messages.
1#! /usr/bin/env python 2# -*- coding: utf-8 -*- 3# vim:fenc=utf-8 4 5""" 6Copyright 2025 Bennett Meares 7 8Licensed under the Apache License, Version 2.0 (the "License"); 9you may not use this file except in compliance with the License. 10You may obtain a copy of the License at 11 12 http://www.apache.org/licenses/LICENSE-2.0 13 14Unless required by applicable law or agreed to in writing, software 15distributed under the License is distributed on an "AS IS" BASIS, 16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17See the License for the specific language governing permissions and 18limitations under the License. 19""" 20 21import atexit 22 23from meerschaum.utils.typing import SuccessTuple 24from meerschaum.utils.packages import attempt_import 25from meerschaum.core.Pipe import Pipe 26from meerschaum.plugins import Plugin 27from meerschaum.utils.venv import Venv 28from meerschaum.jobs import Job, make_executor 29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector 30from meerschaum.utils import get_pipes 31from meerschaum.utils.formatting import pprint 32from meerschaum._internal.docs import index as __doc__ 33from meerschaum.config import __version__, get_config 34from meerschaum._internal.entry import entry 35from meerschaum.__main__ import _close_pools 36 37atexit.register(_close_pools) 38 39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False} 40__all__ = ( 41 "get_pipes", 42 "get_connector", 43 "get_config", 44 "Pipe", 45 "Plugin", 46 "SuccessTuple", 47 "Venv", 48 "Plugin", 49 "Job", 50 "pprint", 51 "attempt_import", 52 "actions", 53 "config", 54 "connectors", 55 "jobs", 56 "plugins", 57 "utils", 58 "SuccessTuple", 59 "Connector", 60 "InstanceConnector", 61 "make_connector", 62 "entry", 63)
29def get_pipes( 30 connector_keys: Union[str, List[str], None] = None, 31 metric_keys: Union[str, List[str], None] = None, 32 location_keys: Union[str, List[str], None] = None, 33 tags: Optional[List[str]] = None, 34 params: Optional[Dict[str, Any]] = None, 35 mrsm_instance: Union[str, InstanceConnector, None] = None, 36 instance: Union[str, InstanceConnector, None] = None, 37 as_list: bool = False, 38 as_tags_dict: bool = False, 39 method: str = 'registered', 40 workers: Optional[int] = None, 41 debug: bool = False, 42 _cache_parameters: bool = True, 43 **kw: Any 44) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]: 45 """ 46 Return a dictionary or list of `meerschaum.Pipe` objects. 47 48 Parameters 49 ---------- 50 connector_keys: Union[str, List[str], None], default None 51 String or list of connector keys. 52 If omitted or is `'*'`, fetch all possible keys. 53 If a string begins with `'_'`, select keys that do NOT match the string. 54 55 metric_keys: Union[str, List[str], None], default None 56 String or list of metric keys. See `connector_keys` for formatting. 57 58 location_keys: Union[str, List[str], None], default None 59 String or list of location keys. See `connector_keys` for formatting. 60 61 tags: Optional[List[str]], default None 62 If provided, only include pipes with these tags. 63 64 params: Optional[Dict[str, Any]], default None 65 Dictionary of additional parameters to search by. 66 Params are parsed into a SQL WHERE clause. 67 E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'` 68 69 mrsm_instance: Union[str, InstanceConnector, None], default None 70 Connector keys for the Meerschaum instance of the pipes. 71 Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or 72 `meerschaum.connectors.api.APIConnector.APIConnector`. 73 74 as_list: bool, default False 75 If `True`, return pipes in a list instead of a hierarchical dictionary. 76 `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}` 77 `True` : `[Pipe]` 78 79 as_tags_dict: bool, default False 80 If `True`, return a dictionary mapping tags to pipes. 81 Pipes with multiple tags will be repeated. 82 83 method: str, default 'registered' 84 Available options: `['registered', 'explicit', 'all']` 85 If `'registered'` (default), create pipes based on registered keys in the connector's pipes table 86 (API or SQL connector, depends on mrsm_instance). 87 If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys 88 instead of consulting the pipes table. Useful for creating non-existent pipes. 89 If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`. 90 **NOTE:** Method `'all'` is not implemented! 91 92 workers: Optional[int], default None 93 If provided (and `as_tags_dict` is `True`), set the number of workers for the pool 94 to fetch tags. 95 Only takes effect if the instance connector supports multi-threading 96 97 **kw: Any: 98 Keyword arguments to pass to the `meerschaum.Pipe` constructor. 99 100 Returns 101 ------- 102 A dictionary of dictionaries and `meerschaum.Pipe` objects 103 in the connector, metric, location hierarchy. 104 If `as_list` is `True`, return a list of `meerschaum.Pipe` objects. 105 If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes. 106 107 Examples 108 -------- 109 ``` 110 >>> ### Manual definition: 111 >>> pipes = { 112 ... <connector_keys>: { 113 ... <metric_key>: { 114 ... <location_key>: Pipe( 115 ... <connector_keys>, 116 ... <metric_key>, 117 ... <location_key>, 118 ... ), 119 ... }, 120 ... }, 121 ... }, 122 >>> ### Accessing a single pipe: 123 >>> pipes['sql:main']['weather'][None] 124 >>> ### Return a list instead: 125 >>> get_pipes(as_list=True) 126 [Pipe('sql:main', 'weather')] 127 >>> get_pipes(as_tags_dict=True) 128 {'gvl': Pipe('sql:main', 'weather')} 129 ``` 130 """ 131 132 import json 133 from collections import defaultdict 134 from meerschaum.config import get_config 135 from meerschaum.utils.warnings import error 136 from meerschaum.utils.misc import filter_keywords 137 from meerschaum.utils.pool import get_pool 138 139 if connector_keys is None: 140 connector_keys = [] 141 if metric_keys is None: 142 metric_keys = [] 143 if location_keys is None: 144 location_keys = [] 145 if params is None: 146 params = {} 147 if tags is None: 148 tags = [] 149 150 if isinstance(connector_keys, str): 151 connector_keys = [connector_keys] 152 if isinstance(metric_keys, str): 153 metric_keys = [metric_keys] 154 if isinstance(location_keys, str): 155 location_keys = [location_keys] 156 157 ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`). 158 if mrsm_instance is None: 159 mrsm_instance = instance 160 if mrsm_instance is None: 161 mrsm_instance = get_config('meerschaum', 'instance', patch=True) 162 if isinstance(mrsm_instance, str): 163 from meerschaum.connectors.parse import parse_instance_keys 164 connector = parse_instance_keys(keys=mrsm_instance, debug=debug) 165 else: 166 from meerschaum.connectors import instance_types 167 valid_connector = False 168 if hasattr(mrsm_instance, 'type'): 169 if mrsm_instance.type in instance_types: 170 valid_connector = True 171 if not valid_connector: 172 error(f"Invalid instance connector: {mrsm_instance}") 173 connector = mrsm_instance 174 if debug: 175 from meerschaum.utils.debug import dprint 176 dprint(f"Using instance connector: {connector}") 177 if not connector: 178 error(f"Could not create connector from keys: '{mrsm_instance}'") 179 180 ### Get a list of tuples for the keys needed to build pipes. 181 result = fetch_pipes_keys( 182 method, 183 connector, 184 connector_keys = connector_keys, 185 metric_keys = metric_keys, 186 location_keys = location_keys, 187 tags = tags, 188 params = params, 189 workers = workers, 190 debug = debug 191 ) 192 if result is None: 193 error("Unable to build pipes!") 194 195 ### Populate the `pipes` dictionary with Pipes based on the keys 196 ### obtained from the chosen `method`. 197 from meerschaum import Pipe 198 pipes = {} 199 for keys_tuple in result: 200 ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2] 201 pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None 202 pipe_parameters = ( 203 pipe_tags_or_parameters 204 if isinstance(pipe_tags_or_parameters, (dict, str)) 205 else None 206 ) 207 if isinstance(pipe_parameters, str): 208 pipe_parameters = json.loads(pipe_parameters) 209 pipe_tags = ( 210 pipe_tags_or_parameters 211 if isinstance(pipe_tags_or_parameters, list) 212 else ( 213 pipe_tags_or_parameters.get('tags', []) 214 if isinstance(pipe_tags_or_parameters, dict) 215 else None 216 ) 217 ) 218 219 if ck not in pipes: 220 pipes[ck] = {} 221 222 if mk not in pipes[ck]: 223 pipes[ck][mk] = {} 224 225 pipe = Pipe( 226 ck, mk, lk, 227 mrsm_instance = connector, 228 parameters = pipe_parameters, 229 tags = pipe_tags, 230 debug = debug, 231 **filter_keywords(Pipe, **kw) 232 ) 233 pipe.__dict__['_tags'] = pipe_tags 234 pipes[ck][mk][lk] = pipe 235 236 if not as_list and not as_tags_dict: 237 return pipes 238 239 from meerschaum.utils.misc import flatten_pipes_dict 240 pipes_list = flatten_pipes_dict(pipes) 241 if as_list: 242 return pipes_list 243 244 pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1)) 245 def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]: 246 _tags = pipe.__dict__.get('_tags', None) 247 gathered_tags = _tags if _tags is not None else pipe.tags 248 return pipe, (gathered_tags or []) 249 250 tags_pipes = defaultdict(lambda: []) 251 pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list)) 252 for pipe, tags in pipes_tags.items(): 253 for tag in (tags or []): 254 tags_pipes[tag].append(pipe) 255 256 return dict(tags_pipes)
Return a dictionary or list of meerschaum.Pipe objects.
Parameters
- connector_keys (Union[str, List[str], None], default None):
String or list of connector keys.
If omitted or is
'*', fetch all possible keys. If a string begins with'_', select keys that do NOT match the string. - metric_keys (Union[str, List[str], None], default None):
String or list of metric keys. See
connector_keysfor formatting. - location_keys (Union[str, List[str], None], default None):
String or list of location keys. See
connector_keysfor formatting. - tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
- params (Optional[Dict[str, Any]], default None):
Dictionary of additional parameters to search by.
Params are parsed into a SQL WHERE clause.
E.g.
{'a': 1, 'b': 2}equates to'WHERE a = 1 AND b = 2' - mrsm_instance (Union[str, InstanceConnector, None], default None):
Connector keys for the Meerschaum instance of the pipes.
Must be a
meerschaum.connectors.sql.SQLConnector.SQLConnectorormeerschaum.connectors.api.APIConnector.APIConnector. - as_list (bool, default False):
If
True, return pipes in a list instead of a hierarchical dictionary.False:{connector_keys: {metric_key: {location_key: Pipe}}}True:[Pipe] - as_tags_dict (bool, default False):
If
True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated. - method (str, default 'registered'):
Available options:
['registered', 'explicit', 'all']If'registered'(default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If'all', create pipes from predefined metrics and locations. Requiredconnector_keys. NOTE: Method'all'is not implemented! - workers (Optional[int], default None):
If provided (and
as_tags_dictisTrue), set the number of workers for the pool to fetch tags. Only takes effect if the instance connector supports multi-threading - **kw (Any:):
Keyword arguments to pass to the
meerschaum.Pipeconstructor.
Returns
- A dictionary of dictionaries and
meerschaum.Pipeobjects - in the connector, metric, location hierarchy.
- If
as_listisTrue, return a list ofmeerschaum.Pipeobjects. - If
as_tags_dictisTrue, return a dictionary mapping tags to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
... <connector_keys>: {
... <metric_key>: {
... <location_key>: Pipe(
... <connector_keys>,
... <metric_key>,
... <location_key>,
... ),
... },
... },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
68def get_connector( 69 type: str = None, 70 label: str = None, 71 refresh: bool = False, 72 debug: bool = False, 73 _load_plugins: bool = True, 74 **kw: Any 75) -> Connector: 76 """ 77 Return existing connector or create new connection and store for reuse. 78 79 You can create new connectors if enough parameters are provided for the given type and flavor. 80 81 Parameters 82 ---------- 83 type: Optional[str], default None 84 Connector type (sql, api, etc.). 85 Defaults to the type of the configured `instance_connector`. 86 87 label: Optional[str], default None 88 Connector label (e.g. main). Defaults to `'main'`. 89 90 refresh: bool, default False 91 Refresh the Connector instance / construct new object. Defaults to `False`. 92 93 kw: Any 94 Other arguments to pass to the Connector constructor. 95 If the Connector has already been constructed and new arguments are provided, 96 `refresh` is set to `True` and the old Connector is replaced. 97 98 Returns 99 ------- 100 A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`, 101 `meerschaum.connectors.sql.SQLConnector`). 102 103 Examples 104 -------- 105 The following parameters would create a new 106 `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file. 107 108 ``` 109 >>> conn = get_connector( 110 ... type = 'sql', 111 ... label = 'newlabel', 112 ... flavor = 'sqlite', 113 ... database = '/file/path/to/database.db' 114 ... ) 115 >>> 116 ``` 117 118 """ 119 from meerschaum.connectors.parse import parse_instance_keys 120 from meerschaum.config import get_config 121 from meerschaum._internal.static import STATIC_CONFIG 122 from meerschaum.utils.warnings import warn 123 global _loaded_plugin_connectors 124 if isinstance(type, str) and not label and ':' in type: 125 type, label = type.split(':', maxsplit=1) 126 127 if _load_plugins: 128 with _locks['_loaded_plugin_connectors']: 129 if not _loaded_plugin_connectors: 130 load_plugin_connectors() 131 _load_builtin_custom_connectors() 132 _loaded_plugin_connectors = True 133 134 if type is None and label is None: 135 default_instance_keys = get_config('meerschaum', 'instance', patch=True) 136 ### recursive call to get_connector 137 return parse_instance_keys(default_instance_keys) 138 139 ### NOTE: the default instance connector may not be main. 140 ### Only fall back to 'main' if the type is provided by the label is omitted. 141 label = label if label is not None else STATIC_CONFIG['connectors']['default_label'] 142 143 ### type might actually be a label. Check if so and raise a warning. 144 if type not in connectors: 145 possibilities, poss_msg = [], "" 146 for _type in get_config('meerschaum', 'connectors'): 147 if type in get_config('meerschaum', 'connectors', _type): 148 possibilities.append(f"{_type}:{type}") 149 if len(possibilities) > 0: 150 poss_msg = " Did you mean" 151 for poss in possibilities[:-1]: 152 poss_msg += f" '{poss}'," 153 if poss_msg.endswith(','): 154 poss_msg = poss_msg[:-1] 155 if len(possibilities) > 1: 156 poss_msg += " or" 157 poss_msg += f" '{possibilities[-1]}'?" 158 159 warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False) 160 return None 161 162 if 'sql' not in types: 163 from meerschaum.connectors.plugin import PluginConnector 164 from meerschaum.connectors.valkey import ValkeyConnector 165 with _locks['types']: 166 types.update({ 167 'api': APIConnector, 168 'sql': SQLConnector, 169 'plugin': PluginConnector, 170 'valkey': ValkeyConnector, 171 }) 172 173 ### determine if we need to call the constructor 174 if not refresh: 175 ### see if any user-supplied arguments differ from the existing instance 176 if label in connectors[type]: 177 warning_message = None 178 for attribute, value in kw.items(): 179 if attribute not in connectors[type][label].meta: 180 import inspect 181 cls = connectors[type][label].__class__ 182 cls_init_signature = inspect.signature(cls) 183 cls_init_params = cls_init_signature.parameters 184 if attribute not in cls_init_params: 185 warning_message = ( 186 f"Received new attribute '{attribute}' not present in connector " + 187 f"{connectors[type][label]}.\n" 188 ) 189 elif connectors[type][label].__dict__[attribute] != value: 190 warning_message = ( 191 f"Mismatched values for attribute '{attribute}' in connector " 192 + f"'{connectors[type][label]}'.\n" + 193 f" - Keyword value: '{value}'\n" + 194 f" - Existing value: '{connectors[type][label].__dict__[attribute]}'\n" 195 ) 196 if warning_message is not None: 197 warning_message += ( 198 "\nSetting `refresh` to True and recreating connector with type:" 199 + f" '{type}' and label '{label}'." 200 ) 201 refresh = True 202 warn(warning_message) 203 else: ### connector doesn't yet exist 204 refresh = True 205 206 ### only create an object if refresh is True 207 ### (can be manually specified, otherwise determined above) 208 if refresh: 209 with _locks['connectors']: 210 try: 211 ### will raise an error if configuration is incorrect / missing 212 conn = types[type](label=label, **kw) 213 connectors[type][label] = conn 214 except InvalidAttributesError as ie: 215 warn( 216 f"Incorrect attributes for connector '{type}:{label}'.\n" 217 + str(ie), 218 stack = False, 219 ) 220 conn = None 221 except Exception as e: 222 from meerschaum.utils.formatting import get_console 223 console = get_console() 224 if console: 225 console.print_exception() 226 warn( 227 f"Exception when creating connector '{type}:{label}'.\n" + str(e), 228 stack = False, 229 ) 230 conn = None 231 if conn is None: 232 return None 233 234 return connectors[type][label]
Return existing connector or create new connection and store for reuse.
You can create new connectors if enough parameters are provided for the given type and flavor.
Parameters
- type (Optional[str], default None):
Connector type (sql, api, etc.).
Defaults to the type of the configured
instance_connector. - label (Optional[str], default None):
Connector label (e.g. main). Defaults to
'main'. - refresh (bool, default False):
Refresh the Connector instance / construct new object. Defaults to
False. - kw (Any):
Other arguments to pass to the Connector constructor.
If the Connector has already been constructed and new arguments are provided,
refreshis set toTrueand the old Connector is replaced.
Returns
- A new Meerschaum connector (e.g.
meerschaum.connectors.api.APIConnector, meerschaum.connectors.sql.SQLConnector).
Examples
The following parameters would create a new
meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.
>>> conn = get_connector(
... type = 'sql',
... label = 'newlabel',
... flavor = 'sqlite',
... database = '/file/path/to/database.db'
... )
>>>
115def get_config( 116 *keys: str, 117 patch: bool = True, 118 substitute: bool = True, 119 sync_files: bool = True, 120 write_missing: bool = True, 121 as_tuple: bool = False, 122 warn: bool = True, 123 debug: bool = False 124) -> Any: 125 """ 126 Return the Meerschaum configuration dictionary. 127 If positional arguments are provided, index by the keys. 128 Raises a warning if invalid keys are provided. 129 130 Parameters 131 ---------- 132 keys: str: 133 List of strings to index. 134 135 patch: bool, default True 136 If `True`, patch missing default keys into the config directory. 137 Defaults to `True`. 138 139 sync_files: bool, default True 140 If `True`, sync files if needed. 141 Defaults to `True`. 142 143 write_missing: bool, default True 144 If `True`, write default values when the main config files are missing. 145 Defaults to `True`. 146 147 substitute: bool, default True 148 If `True`, subsitute 'MRSM{}' values. 149 Defaults to `True`. 150 151 as_tuple: bool, default False 152 If `True`, return a tuple of type (success, value). 153 Defaults to `False`. 154 155 Returns 156 ------- 157 The value in the configuration directory, indexed by the provided keys. 158 159 Examples 160 -------- 161 >>> get_config('meerschaum', 'instance') 162 'sql:main' 163 >>> get_config('does', 'not', 'exist') 164 UserWarning: Invalid keys in config: ('does', 'not', 'exist') 165 """ 166 import json 167 168 symlinks_key = STATIC_CONFIG['config']['symlinks_key'] 169 if debug: 170 from meerschaum.utils.debug import dprint 171 dprint(f"Indexing keys: {keys}", color=False) 172 173 if len(keys) == 0: 174 _rc = _config( 175 substitute=substitute, 176 sync_files=sync_files, 177 write_missing=(write_missing and _allow_write_missing), 178 ) 179 if as_tuple: 180 return True, _rc 181 return _rc 182 183 ### Weird threading issues, only import if substitute is True. 184 if substitute: 185 from meerschaum.config._read_config import search_and_substitute_config 186 ### Invalidate the cache if it was read before with substitute=False 187 ### but there still exist substitutions. 188 if ( 189 config is not None and substitute and keys[0] != symlinks_key 190 and 'MRSM{' in json.dumps(config.get(keys[0])) 191 ): 192 try: 193 _subbed = search_and_substitute_config({keys[0]: config[keys[0]]}) 194 except Exception: 195 import traceback 196 traceback.print_exc() 197 _subbed = {keys[0]: config[keys[0]]} 198 199 config[keys[0]] = _subbed[keys[0]] 200 if symlinks_key in _subbed: 201 if symlinks_key not in config: 202 config[symlinks_key] = {} 203 config[symlinks_key] = apply_patch_to_config( 204 _subbed.get(symlinks_key, {}), 205 config.get(symlinks_key, {}), 206 ) 207 208 from meerschaum.config._sync import sync_files as _sync_files 209 if config is None: 210 _config(*keys, sync_files=sync_files) 211 212 invalid_keys = False 213 if keys[0] not in config and keys[0] != symlinks_key: 214 single_key_config = read_config( 215 keys=[keys[0]], substitute=substitute, write_missing=write_missing 216 ) 217 if keys[0] not in single_key_config: 218 invalid_keys = True 219 else: 220 config[keys[0]] = single_key_config.get(keys[0], None) 221 if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]: 222 if symlinks_key not in config: 223 config[symlinks_key] = {} 224 config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]] 225 226 if sync_files: 227 _sync_files(keys=[keys[0]]) 228 229 c = config 230 if len(keys) > 0: 231 for k in keys: 232 try: 233 c = c[k] 234 except Exception: 235 invalid_keys = True 236 break 237 if invalid_keys: 238 ### Check if the keys are in the default configuration. 239 from meerschaum.config._default import default_config 240 in_default = True 241 patched_default_config = ( 242 search_and_substitute_config(default_config) 243 if substitute else copy.deepcopy(default_config) 244 ) 245 _c = patched_default_config 246 for k in keys: 247 try: 248 _c = _c[k] 249 except Exception: 250 in_default = False 251 if in_default: 252 c = _c 253 invalid_keys = False 254 warning_msg = f"Invalid keys in config: {keys}" 255 if not in_default: 256 try: 257 if warn: 258 from meerschaum.utils.warnings import warn as _warn 259 _warn(warning_msg, stacklevel=3, color=False) 260 except Exception: 261 if warn: 262 print(warning_msg) 263 if as_tuple: 264 return False, None 265 return None 266 267 ### Don't write keys that we haven't yet loaded into memory. 268 not_loaded_keys = [k for k in patched_default_config if k not in config] 269 for k in not_loaded_keys: 270 patched_default_config.pop(k, None) 271 272 set_config( 273 apply_patch_to_config( 274 patched_default_config, 275 config, 276 ) 277 ) 278 if patch and keys[0] != symlinks_key: 279 if write_missing: 280 write_config(config, debug=debug) 281 282 if as_tuple: 283 return (not invalid_keys), c 284 return c
Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.
Parameters
- keys (str:): List of strings to index.
- patch (bool, default True):
If
True, patch missing default keys into the config directory. Defaults toTrue. - sync_files (bool, default True):
If
True, sync files if needed. Defaults toTrue. - write_missing (bool, default True):
If
True, write default values when the main config files are missing. Defaults toTrue. - substitute (bool, default True):
If
True, subsitute 'MRSM{}' values. Defaults toTrue. - as_tuple (bool, default False):
If
True, return a tuple of type (success, value). Defaults toFalse.
Returns
- The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
66class Pipe: 67 """ 68 Access Meerschaum pipes via Pipe objects. 69 70 Pipes are identified by the following: 71 72 1. Connector keys (e.g. `'sql:main'`) 73 2. Metric key (e.g. `'weather'`) 74 3. Location (optional; e.g. `None`) 75 76 A pipe's connector keys correspond to a data source, and when the pipe is synced, 77 its `fetch` definition is evaluated and executed to produce new data. 78 79 Alternatively, new data may be directly synced via `pipe.sync()`: 80 81 ``` 82 >>> from meerschaum import Pipe 83 >>> pipe = Pipe('csv', 'weather') 84 >>> 85 >>> import pandas as pd 86 >>> df = pd.read_csv('weather.csv') 87 >>> pipe.sync(df) 88 ``` 89 """ 90 91 from ._fetch import ( 92 fetch, 93 get_backtrack_interval, 94 ) 95 from ._data import ( 96 get_data, 97 get_backtrack_data, 98 get_rowcount, 99 get_data, 100 get_doc, 101 get_value, 102 _get_data_as_iterator, 103 get_chunk_interval, 104 get_chunk_bounds, 105 get_chunk_bounds_batches, 106 parse_date_bounds, 107 ) 108 from ._register import register 109 from ._attributes import ( 110 attributes, 111 parameters, 112 columns, 113 indices, 114 indexes, 115 dtypes, 116 autoincrement, 117 autotime, 118 upsert, 119 static, 120 tzinfo, 121 enforce, 122 null_indices, 123 mixed_numerics, 124 get_columns, 125 get_columns_types, 126 get_columns_indices, 127 get_indices, 128 get_parameters, 129 get_dtypes, 130 update_parameters, 131 tags, 132 get_id, 133 id, 134 get_val_column, 135 parents, 136 parent, 137 children, 138 child, 139 reference, 140 references, 141 target, 142 _target_legacy, 143 guess_datetime, 144 precision, 145 get_precision, 146 ) 147 from ._cache import ( 148 _get_cache_connector, 149 _cache_value, 150 _get_cached_value, 151 _invalidate_cache, 152 _get_cache_dir_path, 153 _write_cache_key, 154 _write_cache_file, 155 _write_cache_conn_key, 156 _read_cache_key, 157 _read_cache_file, 158 _read_cache_conn_key, 159 _load_cache_keys, 160 _load_cache_files, 161 _load_cache_conn_keys, 162 _get_cache_keys, 163 _get_cache_file_keys, 164 _get_cache_conn_keys, 165 _clear_cache_key, 166 _clear_cache_file, 167 _clear_cache_conn_key, 168 ) 169 from ._show import show 170 from ._edit import edit, edit_definition, update 171 from ._sync import ( 172 sync, 173 get_sync_time, 174 exists, 175 filter_existing, 176 _get_chunk_label, 177 get_num_workers, 178 _persist_new_special_columns, 179 ) 180 from ._verify import ( 181 verify, 182 get_bound_interval, 183 get_bound_time, 184 ) 185 from ._delete import delete 186 from ._drop import drop, drop_indices 187 from ._index import create_indices 188 from ._clear import clear 189 from ._deduplicate import deduplicate 190 from ._bootstrap import bootstrap 191 from ._dtypes import enforce_dtypes, infer_dtypes 192 from ._copy import copy_to 193 194 def __init__( 195 self, 196 connector: str = '', 197 metric: str = '', 198 location: Optional[str] = None, 199 parameters: Optional[Dict[str, Any]] = None, 200 columns: Union[Dict[str, str], List[str], None] = None, 201 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 202 tags: Optional[List[str]] = None, 203 target: Optional[str] = None, 204 dtypes: Optional[Dict[str, str]] = None, 205 instance: Optional[Union[str, InstanceConnector]] = None, 206 upsert: Optional[bool] = None, 207 autoincrement: Optional[bool] = None, 208 autotime: Optional[bool] = None, 209 precision: Union[str, Dict[str, Union[str, int]], None] = None, 210 static: Optional[bool] = None, 211 enforce: Optional[bool] = None, 212 null_indices: Optional[bool] = None, 213 mixed_numerics: Optional[bool] = None, 214 temporary: bool = False, 215 cache: Optional[bool] = None, 216 cache_connector_keys: Optional[str] = None, 217 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 218 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 219 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 220 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 221 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 222 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 223 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 224 connector_keys: Optional[str] = None, 225 metric_key: Optional[str] = None, 226 location_key: Optional[str] = None, 227 instance_keys: Optional[str] = None, 228 indexes: Union[Dict[str, str], List[str], None] = None, 229 debug: bool = False, 230 ): 231 """ 232 Parameters 233 ---------- 234 connector: str 235 Keys for the pipe's source connector, e.g. `'sql:main'`. 236 237 metric: str 238 Label for the pipe's contents, e.g. `'weather'`. 239 240 location: str, default None 241 Label for the pipe's location. Defaults to `None`. 242 243 parameters: Optional[Dict[str, Any]], default None 244 Optionally set a pipe's parameters from the constructor, 245 e.g. columns and other attributes. 246 You can edit these parameters with `edit pipes`. 247 248 columns: Union[Dict[str, str], List[str], None], default None 249 Set the `columns` dictionary of `parameters`. 250 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 251 252 indices: Optional[Dict[str, Union[str, List[str]]]], default None 253 Set the `indices` dictionary of `parameters`. 254 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 255 256 tags: Optional[List[str]], default None 257 A list of strings to be added under the `'tags'` key of `parameters`. 258 You can select pipes with certain tags using `--tags`. 259 260 dtypes: Optional[Dict[str, str]], default None 261 Set the `dtypes` dictionary of `parameters`. 262 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 263 264 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 265 Connector for the Meerschaum instance where the pipe resides. 266 Defaults to the preconfigured default instance (`'sql:main'`). 267 268 instance: Optional[Union[str, InstanceConnector]], default None 269 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 270 271 upsert: Optional[bool], default None 272 If `True`, set `upsert` to `True` in the parameters. 273 274 autoincrement: Optional[bool], default None 275 If `True`, set `autoincrement` in the parameters. 276 277 autotime: Optional[bool], default None 278 If `True`, set `autotime` in the parameters. 279 280 precision: Union[str, Dict[str, Union[str, int]], None], default None 281 If provided, set `precision` in the parameters. 282 This may be either a string (the precision unit) or a dictionary of in the form 283 `{'unit': <unit>, 'interval': <interval>}`. 284 Default is determined by the `datetime` column dtype 285 (e.g. `datetime64[us]` is `microsecond` precision). 286 287 static: Optional[bool], default None 288 If `True`, set `static` in the parameters. 289 290 enforce: Optional[bool], default None 291 If `False`, skip data type enforcement. 292 Default behavior is `True`. 293 294 null_indices: Optional[bool], default None 295 Set to `False` if there will be no null values in the index columns. 296 Defaults to `True`. 297 298 mixed_numerics: bool, default None 299 If `True`, integer columns will be converted to `numeric` when floats are synced. 300 Set to `False` to disable this behavior. 301 Defaults to `True`. 302 303 temporary: bool, default False 304 If `True`, prevent instance tables (pipes, users, plugins) from being created. 305 306 cache: Optional[bool], default None 307 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 308 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 309 Defaults to `True` (from `None`). 310 311 cache_connector_keys: Optional[str], default None 312 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 313 314 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 315 If provided, inherit the parameters of the reference Pipe(s). 316 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 317 a Pipe itself, or a list of any of these values. 318 319 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 320 Set references for parent pipes. See `references` for values. 321 322 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 323 Set references for child pipes. See `references` for values. 324 325 """ 326 from meerschaum.utils.warnings import error, warn 327 if (not connector and not connector_keys) or (not metric and not metric_key): 328 error( 329 "Please provide strings for the connector and metric\n " 330 + "(first two positional arguments)." 331 ) 332 333 ### Fall back to legacy `location_key` just in case. 334 if not location: 335 location = location_key 336 337 if not connector: 338 connector = connector_keys 339 340 if not metric: 341 metric = metric_key 342 343 if location in ('[None]', 'None'): 344 location = None 345 346 from meerschaum._internal.static import STATIC_CONFIG 347 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 348 for k in (connector, metric, location, *(tags or [])): 349 if str(k).startswith(negation_prefix): 350 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 351 352 self._connector_keys = str(connector) 353 self._connector_key = self.connector_keys ### Alias 354 self._metric_key = metric 355 self._location_key = location 356 self.temporary = temporary 357 self.cache = ( 358 cache 359 if cache is not None 360 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 361 ) 362 self.cache_connector_keys = ( 363 str(cache_connector_keys) 364 if cache_connector_keys is not None 365 else None 366 ) 367 self.debug = debug 368 369 self._attributes: Dict[str, Any] = { 370 'connector_keys': self._connector_keys, 371 'metric_key': self._metric_key, 372 'location_key': self._location_key, 373 'parameters': {}, 374 } 375 376 ### only set parameters if values are provided 377 if isinstance(parameters, dict): 378 self._attributes['parameters'] = parameters 379 else: 380 if parameters is not None: 381 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 382 self._attributes['parameters'] = {} 383 384 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 385 if isinstance(columns, (list, tuple)): 386 columns = {str(col): str(col) for col in columns} 387 if isinstance(columns, dict): 388 self._attributes['parameters']['columns'] = columns 389 elif isinstance(columns, str) and 'Pipe(' in columns: 390 pass 391 elif columns is not None: 392 warn(f"The provided columns are of invalid type '{type(columns)}'.") 393 394 indices = ( 395 indices 396 or indexes 397 or self._attributes.get('parameters', {}).get('indices', None) 398 or self._attributes.get('parameters', {}).get('indexes', None) 399 ) 400 if isinstance(indices, dict): 401 indices_key = ( 402 'indexes' 403 if 'indexes' in self._attributes['parameters'] 404 else 'indices' 405 ) 406 self._attributes['parameters'][indices_key] = indices 407 408 if isinstance(tags, (list, tuple)): 409 self._attributes['parameters']['tags'] = tags 410 elif tags is not None: 411 warn(f"The provided tags are of invalid type '{type(tags)}'.") 412 413 if isinstance(target, str): 414 self._attributes['parameters']['target'] = target 415 elif target is not None: 416 warn(f"The provided target is of invalid type '{type(target)}'.") 417 418 if isinstance(dtypes, dict): 419 self._attributes['parameters']['dtypes'] = dtypes 420 elif dtypes is not None: 421 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 422 423 if isinstance(upsert, bool): 424 self._attributes['parameters']['upsert'] = upsert 425 426 if isinstance(autoincrement, bool): 427 self._attributes['parameters']['autoincrement'] = autoincrement 428 429 if isinstance(autotime, bool): 430 self._attributes['parameters']['autotime'] = autotime 431 432 if isinstance(precision, dict): 433 self._attributes['parameters']['precision'] = precision 434 elif isinstance(precision, str): 435 self._attributes['parameters']['precision'] = {'unit': precision} 436 437 if isinstance(static, bool): 438 self._attributes['parameters']['static'] = static 439 self._static = static 440 441 if isinstance(enforce, bool): 442 self._attributes['parameters']['enforce'] = enforce 443 444 if isinstance(null_indices, bool): 445 self._attributes['parameters']['null_indices'] = null_indices 446 447 if isinstance(mixed_numerics, bool): 448 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 449 450 ### NOTE: The parameters dictionary is {} by default. 451 ### A Pipe may be registered without parameters, then edited, 452 ### or a Pipe may be registered with parameters set in-memory first. 453 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 454 if _mrsm_instance is None: 455 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 456 457 if not isinstance(_mrsm_instance, str): 458 self._instance_connector = _mrsm_instance 459 self._instance_keys = str(_mrsm_instance) 460 else: 461 self._instance_keys = _mrsm_instance 462 463 if self._instance_keys == 'sql:memory': 464 self.cache = False 465 466 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 467 468 if references is not None or reference is not None: 469 reference_vals = references if references is not None else reference 470 self.references = reference_vals 471 472 if parents is not None or parent is not None: 473 parent_vals = parents if parents is not None else parent 474 self.parents = parent_vals 475 476 if children is not None or child is not None: 477 children_vals = children if children is not None else child 478 self.children = children_vals 479 480 @property 481 def metric_key(self) -> str: 482 """ 483 Return the pipe's metric key. 484 """ 485 return self._metric_key 486 487 @property 488 def metric(self) -> str: 489 """ 490 Return the pipe's metric key. 491 """ 492 return self._metric_key 493 494 @property 495 def location_key(self) -> Union[str, None]: 496 """ 497 Return the pipe's location key. 498 """ 499 return self._location_key 500 501 @property 502 def location(self) -> Union[str, None]: 503 """ 504 Return the pipe's location key. 505 """ 506 return self._location_key 507 508 @property 509 def meta(self): 510 """ 511 Return the four keys needed to reconstruct this pipe. 512 """ 513 return { 514 'connector_keys': self.connector_keys, 515 'metric_key': self.metric_key, 516 'location_key': self.location_key, 517 'instance_keys': self.instance_keys, 518 } 519 520 def keys(self) -> List[str]: 521 """ 522 Return the ordered keys for this pipe. 523 """ 524 return { 525 key: val 526 for key, val in self.meta.items() 527 if key != 'instance' 528 } 529 530 @property 531 def instance_keys(self) -> str: 532 """ 533 Return the pipe's instance keys. 534 """ 535 return self._instance_keys 536 537 @property 538 def instance(self) -> Union[InstanceConnector, str]: 539 """ 540 Return the pipe's instance connector or keys. 541 """ 542 conn = self.instance_connector 543 if conn is None: 544 return self.instance_keys 545 return conn 546 547 @property 548 def instance_connector(self) -> Union[InstanceConnector, None]: 549 """ 550 The instance connector on which this pipe resides. 551 """ 552 if '_instance_connector' not in self.__dict__: 553 from meerschaum.connectors.parse import parse_instance_keys 554 conn = parse_instance_keys(self.instance_keys) 555 if conn: 556 self._instance_connector = conn 557 else: 558 return None 559 return self._instance_connector 560 561 @property 562 def connector_keys(self) -> str: 563 """ 564 Return the pipe's connector keys. 565 """ 566 return self._connector_keys 567 568 @property 569 def connector_key(self) -> str: 570 """ 571 Legacy: use `Pipe.connector_keys` instead. 572 """ 573 return self.connector_keys 574 575 @property 576 def connector(self) -> Union['Connector', str]: 577 """ 578 The connector to the data source. 579 """ 580 if '_connector' not in self.__dict__: 581 from meerschaum.connectors.parse import parse_instance_keys 582 import warnings 583 with warnings.catch_warnings(): 584 warnings.simplefilter('ignore') 585 try: 586 conn = parse_instance_keys(self.connector_keys) 587 except Exception: 588 conn = None 589 if conn: 590 self._connector = conn 591 else: 592 return self._connector_keys 593 return self._connector 594 595 def __str__(self, ansi: bool=False): 596 return pipe_repr(self, ansi=ansi) 597 598 def __eq__(self, other): 599 try: 600 return ( 601 isinstance(self, type(other)) 602 and self.connector_keys == other.connector_keys 603 and self.metric_key == other.metric_key 604 and self.location_key == other.location_key 605 and self.instance_keys == other.instance_keys 606 ) 607 except Exception: 608 return False 609 610 def __hash__(self): 611 ### Using an esoteric separator to avoid collisions. 612 sep = "[\"']" 613 return hash( 614 str(self.connector_keys) + sep 615 + str(self.metric_key) + sep 616 + str(self.location_key) + sep 617 + str(self.instance_keys) + sep 618 ) 619 620 def __repr__(self, ansi: bool=True, **kw) -> str: 621 if not hasattr(sys, 'ps1'): 622 ansi = False 623 624 return pipe_repr(self, ansi=ansi, **kw) 625 626 def __pt_repr__(self): 627 from meerschaum.utils.packages import attempt_import 628 prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False) 629 return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True)) 630 631 def __getstate__(self) -> Dict[str, Any]: 632 """ 633 Define the state dictionary (pickling). 634 """ 635 return { 636 'connector_keys': self.connector_keys, 637 'metric_key': self.metric_key, 638 'location_key': self.location_key, 639 'parameters': self._attributes.get('parameters', None), 640 'instance_keys': self.instance_keys, 641 } 642 643 def __setstate__(self, _state: Dict[str, Any]): 644 """ 645 Read the state (unpickling). 646 """ 647 self.__init__(**_state) 648 649 def __getitem__(self, key: str) -> Any: 650 """ 651 Index the pipe's attributes. 652 If the `key` cannot be found`, return `None`. 653 """ 654 if key in self.attributes: 655 return self.attributes.get(key, None) 656 657 aliases = { 658 'connector': 'connector_keys', 659 'connector_key': 'connector_keys', 660 'metric': 'metric_key', 661 'location': 'location_key', 662 } 663 aliased_key = aliases.get(key, None) 664 if aliased_key is not None: 665 return self.attributes.get(aliased_key, None) 666 667 property_aliases = { 668 'instance': 'instance_keys', 669 'instance_key': 'instance_keys', 670 } 671 aliased_key = property_aliases.get(key, None) 672 if aliased_key is not None: 673 key = aliased_key 674 return getattr(self, key, None) 675 676 def __copy__(self): 677 """ 678 Return a shallow copy of the current pipe. 679 """ 680 return mrsm.Pipe( 681 self.connector_keys, self.metric_key, self.location_key, 682 instance=self.instance_keys, 683 parameters=self._attributes.get('parameters', None), 684 ) 685 686 def __deepcopy__(self, memo): 687 """ 688 Return a deep copy of the current pipe. 689 """ 690 return self.__copy__()
Access Meerschaum pipes via Pipe objects.
Pipes are identified by the following:
- Connector keys (e.g.
'sql:main') - Metric key (e.g.
'weather') - Location (optional; e.g.
None)
A pipe's connector keys correspond to a data source, and when the pipe is synced,
its fetch definition is evaluated and executed to produce new data.
Alternatively, new data may be directly synced via pipe.sync():
>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
194 def __init__( 195 self, 196 connector: str = '', 197 metric: str = '', 198 location: Optional[str] = None, 199 parameters: Optional[Dict[str, Any]] = None, 200 columns: Union[Dict[str, str], List[str], None] = None, 201 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 202 tags: Optional[List[str]] = None, 203 target: Optional[str] = None, 204 dtypes: Optional[Dict[str, str]] = None, 205 instance: Optional[Union[str, InstanceConnector]] = None, 206 upsert: Optional[bool] = None, 207 autoincrement: Optional[bool] = None, 208 autotime: Optional[bool] = None, 209 precision: Union[str, Dict[str, Union[str, int]], None] = None, 210 static: Optional[bool] = None, 211 enforce: Optional[bool] = None, 212 null_indices: Optional[bool] = None, 213 mixed_numerics: Optional[bool] = None, 214 temporary: bool = False, 215 cache: Optional[bool] = None, 216 cache_connector_keys: Optional[str] = None, 217 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 218 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 219 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 220 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 221 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 222 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 223 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 224 connector_keys: Optional[str] = None, 225 metric_key: Optional[str] = None, 226 location_key: Optional[str] = None, 227 instance_keys: Optional[str] = None, 228 indexes: Union[Dict[str, str], List[str], None] = None, 229 debug: bool = False, 230 ): 231 """ 232 Parameters 233 ---------- 234 connector: str 235 Keys for the pipe's source connector, e.g. `'sql:main'`. 236 237 metric: str 238 Label for the pipe's contents, e.g. `'weather'`. 239 240 location: str, default None 241 Label for the pipe's location. Defaults to `None`. 242 243 parameters: Optional[Dict[str, Any]], default None 244 Optionally set a pipe's parameters from the constructor, 245 e.g. columns and other attributes. 246 You can edit these parameters with `edit pipes`. 247 248 columns: Union[Dict[str, str], List[str], None], default None 249 Set the `columns` dictionary of `parameters`. 250 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 251 252 indices: Optional[Dict[str, Union[str, List[str]]]], default None 253 Set the `indices` dictionary of `parameters`. 254 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 255 256 tags: Optional[List[str]], default None 257 A list of strings to be added under the `'tags'` key of `parameters`. 258 You can select pipes with certain tags using `--tags`. 259 260 dtypes: Optional[Dict[str, str]], default None 261 Set the `dtypes` dictionary of `parameters`. 262 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 263 264 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 265 Connector for the Meerschaum instance where the pipe resides. 266 Defaults to the preconfigured default instance (`'sql:main'`). 267 268 instance: Optional[Union[str, InstanceConnector]], default None 269 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 270 271 upsert: Optional[bool], default None 272 If `True`, set `upsert` to `True` in the parameters. 273 274 autoincrement: Optional[bool], default None 275 If `True`, set `autoincrement` in the parameters. 276 277 autotime: Optional[bool], default None 278 If `True`, set `autotime` in the parameters. 279 280 precision: Union[str, Dict[str, Union[str, int]], None], default None 281 If provided, set `precision` in the parameters. 282 This may be either a string (the precision unit) or a dictionary of in the form 283 `{'unit': <unit>, 'interval': <interval>}`. 284 Default is determined by the `datetime` column dtype 285 (e.g. `datetime64[us]` is `microsecond` precision). 286 287 static: Optional[bool], default None 288 If `True`, set `static` in the parameters. 289 290 enforce: Optional[bool], default None 291 If `False`, skip data type enforcement. 292 Default behavior is `True`. 293 294 null_indices: Optional[bool], default None 295 Set to `False` if there will be no null values in the index columns. 296 Defaults to `True`. 297 298 mixed_numerics: bool, default None 299 If `True`, integer columns will be converted to `numeric` when floats are synced. 300 Set to `False` to disable this behavior. 301 Defaults to `True`. 302 303 temporary: bool, default False 304 If `True`, prevent instance tables (pipes, users, plugins) from being created. 305 306 cache: Optional[bool], default None 307 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 308 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 309 Defaults to `True` (from `None`). 310 311 cache_connector_keys: Optional[str], default None 312 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 313 314 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 315 If provided, inherit the parameters of the reference Pipe(s). 316 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 317 a Pipe itself, or a list of any of these values. 318 319 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 320 Set references for parent pipes. See `references` for values. 321 322 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 323 Set references for child pipes. See `references` for values. 324 325 """ 326 from meerschaum.utils.warnings import error, warn 327 if (not connector and not connector_keys) or (not metric and not metric_key): 328 error( 329 "Please provide strings for the connector and metric\n " 330 + "(first two positional arguments)." 331 ) 332 333 ### Fall back to legacy `location_key` just in case. 334 if not location: 335 location = location_key 336 337 if not connector: 338 connector = connector_keys 339 340 if not metric: 341 metric = metric_key 342 343 if location in ('[None]', 'None'): 344 location = None 345 346 from meerschaum._internal.static import STATIC_CONFIG 347 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 348 for k in (connector, metric, location, *(tags or [])): 349 if str(k).startswith(negation_prefix): 350 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 351 352 self._connector_keys = str(connector) 353 self._connector_key = self.connector_keys ### Alias 354 self._metric_key = metric 355 self._location_key = location 356 self.temporary = temporary 357 self.cache = ( 358 cache 359 if cache is not None 360 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 361 ) 362 self.cache_connector_keys = ( 363 str(cache_connector_keys) 364 if cache_connector_keys is not None 365 else None 366 ) 367 self.debug = debug 368 369 self._attributes: Dict[str, Any] = { 370 'connector_keys': self._connector_keys, 371 'metric_key': self._metric_key, 372 'location_key': self._location_key, 373 'parameters': {}, 374 } 375 376 ### only set parameters if values are provided 377 if isinstance(parameters, dict): 378 self._attributes['parameters'] = parameters 379 else: 380 if parameters is not None: 381 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 382 self._attributes['parameters'] = {} 383 384 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 385 if isinstance(columns, (list, tuple)): 386 columns = {str(col): str(col) for col in columns} 387 if isinstance(columns, dict): 388 self._attributes['parameters']['columns'] = columns 389 elif isinstance(columns, str) and 'Pipe(' in columns: 390 pass 391 elif columns is not None: 392 warn(f"The provided columns are of invalid type '{type(columns)}'.") 393 394 indices = ( 395 indices 396 or indexes 397 or self._attributes.get('parameters', {}).get('indices', None) 398 or self._attributes.get('parameters', {}).get('indexes', None) 399 ) 400 if isinstance(indices, dict): 401 indices_key = ( 402 'indexes' 403 if 'indexes' in self._attributes['parameters'] 404 else 'indices' 405 ) 406 self._attributes['parameters'][indices_key] = indices 407 408 if isinstance(tags, (list, tuple)): 409 self._attributes['parameters']['tags'] = tags 410 elif tags is not None: 411 warn(f"The provided tags are of invalid type '{type(tags)}'.") 412 413 if isinstance(target, str): 414 self._attributes['parameters']['target'] = target 415 elif target is not None: 416 warn(f"The provided target is of invalid type '{type(target)}'.") 417 418 if isinstance(dtypes, dict): 419 self._attributes['parameters']['dtypes'] = dtypes 420 elif dtypes is not None: 421 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 422 423 if isinstance(upsert, bool): 424 self._attributes['parameters']['upsert'] = upsert 425 426 if isinstance(autoincrement, bool): 427 self._attributes['parameters']['autoincrement'] = autoincrement 428 429 if isinstance(autotime, bool): 430 self._attributes['parameters']['autotime'] = autotime 431 432 if isinstance(precision, dict): 433 self._attributes['parameters']['precision'] = precision 434 elif isinstance(precision, str): 435 self._attributes['parameters']['precision'] = {'unit': precision} 436 437 if isinstance(static, bool): 438 self._attributes['parameters']['static'] = static 439 self._static = static 440 441 if isinstance(enforce, bool): 442 self._attributes['parameters']['enforce'] = enforce 443 444 if isinstance(null_indices, bool): 445 self._attributes['parameters']['null_indices'] = null_indices 446 447 if isinstance(mixed_numerics, bool): 448 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 449 450 ### NOTE: The parameters dictionary is {} by default. 451 ### A Pipe may be registered without parameters, then edited, 452 ### or a Pipe may be registered with parameters set in-memory first. 453 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 454 if _mrsm_instance is None: 455 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 456 457 if not isinstance(_mrsm_instance, str): 458 self._instance_connector = _mrsm_instance 459 self._instance_keys = str(_mrsm_instance) 460 else: 461 self._instance_keys = _mrsm_instance 462 463 if self._instance_keys == 'sql:memory': 464 self.cache = False 465 466 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 467 468 if references is not None or reference is not None: 469 reference_vals = references if references is not None else reference 470 self.references = reference_vals 471 472 if parents is not None or parent is not None: 473 parent_vals = parents if parents is not None else parent 474 self.parents = parent_vals 475 476 if children is not None or child is not None: 477 children_vals = children if children is not None else child 478 self.children = children_vals
Parameters
- connector (str):
Keys for the pipe's source connector, e.g.
'sql:main'. - metric (str):
Label for the pipe's contents, e.g.
'weather'. - location (str, default None):
Label for the pipe's location. Defaults to
None. - parameters (Optional[Dict[str, Any]], default None):
Optionally set a pipe's parameters from the constructor,
e.g. columns and other attributes.
You can edit these parameters with
edit pipes. - columns (Union[Dict[str, str], List[str], None], default None):
Set the
columnsdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'columns'key. - indices (Optional[Dict[str, Union[str, List[str]]]], default None):
Set the
indicesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'indices'key. - tags (Optional[List[str]], default None):
A list of strings to be added under the
'tags'key ofparameters. You can select pipes with certain tags using--tags. - dtypes (Optional[Dict[str, str]], default None):
Set the
dtypesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'dtypes'key. - mrsm_instance (Optional[Union[str, InstanceConnector]], default None):
Connector for the Meerschaum instance where the pipe resides.
Defaults to the preconfigured default instance (
'sql:main'). - instance (Optional[Union[str, InstanceConnector]], default None):
Alias for
mrsm_instance. Ifmrsm_instanceis supplied, this value is ignored. - upsert (Optional[bool], default None):
If
True, setupserttoTruein the parameters. - autoincrement (Optional[bool], default None):
If
True, setautoincrementin the parameters. - autotime (Optional[bool], default None):
If
True, setautotimein the parameters. - precision (Union[str, Dict[str, Union[str, int]], None], default None):
If provided, set
precisionin the parameters. This may be either a string (the precision unit) or a dictionary of in the form{'unit': <unit>, 'interval': <interval>}. Default is determined by thedatetimecolumn dtype (e.g.datetime64[us]ismicrosecondprecision). - static (Optional[bool], default None):
If
True, setstaticin the parameters. - enforce (Optional[bool], default None):
If
False, skip data type enforcement. Default behavior isTrue. - null_indices (Optional[bool], default None):
Set to
Falseif there will be no null values in the index columns. Defaults toTrue. - mixed_numerics (bool, default None):
If
True, integer columns will be converted tonumericwhen floats are synced. Set toFalseto disable this behavior. Defaults toTrue. - temporary (bool, default False):
If
True, prevent instance tables (pipes, users, plugins) from being created. - cache (Optional[bool], default None):
If
True, cache the pipe's metadata to disk (in addition to in-memory caching). Ifcacheis not explicitlyTrue, it is set toFalseiftemporaryisTrue. Defaults toTrue(fromNone). - cache_connector_keys (Optional[str], default None):
If provided, use the keys to a Valkey connector (e.g.
valkey:main). - references (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): If provided, inherit the parameters of the reference Pipe(s). May be equal to a string of the Pipe constructor, a dictionary of constructor keys, a Pipe itself, or a list of any of these values.
- parents (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for parent pipes. See
referencesfor values. - children (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for child pipes. See
referencesfor values.
480 @property 481 def metric_key(self) -> str: 482 """ 483 Return the pipe's metric key. 484 """ 485 return self._metric_key
Return the pipe's metric key.
487 @property 488 def metric(self) -> str: 489 """ 490 Return the pipe's metric key. 491 """ 492 return self._metric_key
Return the pipe's metric key.
494 @property 495 def location_key(self) -> Union[str, None]: 496 """ 497 Return the pipe's location key. 498 """ 499 return self._location_key
Return the pipe's location key.
501 @property 502 def location(self) -> Union[str, None]: 503 """ 504 Return the pipe's location key. 505 """ 506 return self._location_key
Return the pipe's location key.
508 @property 509 def meta(self): 510 """ 511 Return the four keys needed to reconstruct this pipe. 512 """ 513 return { 514 'connector_keys': self.connector_keys, 515 'metric_key': self.metric_key, 516 'location_key': self.location_key, 517 'instance_keys': self.instance_keys, 518 }
Return the four keys needed to reconstruct this pipe.
520 def keys(self) -> List[str]: 521 """ 522 Return the ordered keys for this pipe. 523 """ 524 return { 525 key: val 526 for key, val in self.meta.items() 527 if key != 'instance' 528 }
Return the ordered keys for this pipe.
530 @property 531 def instance_keys(self) -> str: 532 """ 533 Return the pipe's instance keys. 534 """ 535 return self._instance_keys
Return the pipe's instance keys.
537 @property 538 def instance(self) -> Union[InstanceConnector, str]: 539 """ 540 Return the pipe's instance connector or keys. 541 """ 542 conn = self.instance_connector 543 if conn is None: 544 return self.instance_keys 545 return conn
Return the pipe's instance connector or keys.
547 @property 548 def instance_connector(self) -> Union[InstanceConnector, None]: 549 """ 550 The instance connector on which this pipe resides. 551 """ 552 if '_instance_connector' not in self.__dict__: 553 from meerschaum.connectors.parse import parse_instance_keys 554 conn = parse_instance_keys(self.instance_keys) 555 if conn: 556 self._instance_connector = conn 557 else: 558 return None 559 return self._instance_connector
The instance connector on which this pipe resides.
561 @property 562 def connector_keys(self) -> str: 563 """ 564 Return the pipe's connector keys. 565 """ 566 return self._connector_keys
Return the pipe's connector keys.
568 @property 569 def connector_key(self) -> str: 570 """ 571 Legacy: use `Pipe.connector_keys` instead. 572 """ 573 return self.connector_keys
Legacy: use Pipe.connector_keys instead.
575 @property 576 def connector(self) -> Union['Connector', str]: 577 """ 578 The connector to the data source. 579 """ 580 if '_connector' not in self.__dict__: 581 from meerschaum.connectors.parse import parse_instance_keys 582 import warnings 583 with warnings.catch_warnings(): 584 warnings.simplefilter('ignore') 585 try: 586 conn = parse_instance_keys(self.connector_keys) 587 except Exception: 588 conn = None 589 if conn: 590 self._connector = conn 591 else: 592 return self._connector_keys 593 return self._connector
The connector to the data source.
21def fetch( 22 self, 23 begin: Union[datetime, int, str, None] = '', 24 end: Union[datetime, int, None] = None, 25 check_existing: bool = True, 26 sync_chunks: bool = False, 27 debug: bool = False, 28 **kw: Any 29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 30 """ 31 Fetch a Pipe's latest data from its connector. 32 33 Parameters 34 ---------- 35 begin: Union[datetime, str, None], default '': 36 If provided, only fetch data newer than or equal to `begin`. 37 38 end: Optional[datetime], default None: 39 If provided, only fetch data older than or equal to `end`. 40 41 check_existing: bool, default True 42 If `False`, do not apply the backtrack interval. 43 44 sync_chunks: bool, default False 45 If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching 46 loads chunks into memory. 47 48 debug: bool, default False 49 Verbosity toggle. 50 51 Returns 52 ------- 53 A `pd.DataFrame` of the newest unseen data. 54 55 """ 56 if 'fetch' not in dir(self.connector): 57 warn(f"No `fetch()` function defined for connector '{self.connector}'") 58 return None 59 60 from meerschaum.connectors import get_connector_plugin 61 from meerschaum.utils.misc import filter_arguments 62 63 _chunk_hook = kw.pop('chunk_hook', None) 64 kw['workers'] = self.get_num_workers(kw.get('workers', None)) 65 if sync_chunks and _chunk_hook is None: 66 67 def _chunk_hook(chunk, **_kw) -> SuccessTuple: 68 """ 69 Wrap `Pipe.sync()` with a custom chunk label prepended to the message. 70 """ 71 from meerschaum.config._patch import apply_patch_to_config 72 kwargs = apply_patch_to_config(kw, _kw) 73 chunk_success, chunk_message = self.sync(chunk, **kwargs) 74 chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None)) 75 if chunk_label: 76 chunk_message = '\n' + chunk_label + '\n' + chunk_message 77 return chunk_success, chunk_message 78 79 begin, end = self.parse_date_bounds(begin, end) 80 81 with mrsm.Venv(get_connector_plugin(self.connector)): 82 _args, _kwargs = filter_arguments( 83 self.connector.fetch, 84 self, 85 begin=_determine_begin( 86 self, 87 begin, 88 end, 89 check_existing=check_existing, 90 debug=debug, 91 ), 92 end=end, 93 chunk_hook=_chunk_hook, 94 debug=debug, 95 **kw 96 ) 97 df = self.connector.fetch(*_args, **_kwargs) 98 return df
Fetch a Pipe's latest data from its connector.
Parameters
- begin (Union[datetime, str, None], default '':):
If provided, only fetch data newer than or equal to
begin. - end (Optional[datetime], default None:):
If provided, only fetch data older than or equal to
end. - check_existing (bool, default True):
If
False, do not apply the backtrack interval. - sync_chunks (bool, default False):
If
Trueand the pipe's connector is of type'sql', begin syncing chunks while fetching loads chunks into memory. - debug (bool, default False): Verbosity toggle.
Returns
- A
pd.DataFrameof the newest unseen data.
101def get_backtrack_interval( 102 self, 103 check_existing: bool = True, 104 debug: bool = False, 105) -> Union[timedelta, int]: 106 """ 107 Get the chunk interval to use for this pipe. 108 109 Parameters 110 ---------- 111 check_existing: bool, default True 112 If `False`, return a backtrack_interval of 0 minutes. 113 114 Returns 115 ------- 116 The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 117 """ 118 default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes') 119 configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None) 120 backtrack_minutes = ( 121 configured_backtrack_minutes 122 if configured_backtrack_minutes is not None 123 else default_backtrack_minutes 124 ) if check_existing else 0 125 126 backtrack_interval = timedelta(minutes=backtrack_minutes) 127 dt_col = self.columns.get('datetime', None) 128 if dt_col is None: 129 return backtrack_interval 130 131 dt_dtype = self.dtypes.get(dt_col, 'datetime') 132 if 'int' in dt_dtype.lower(): 133 return backtrack_minutes 134 135 return backtrack_interval
Get the chunk interval to use for this pipe.
Parameters
- check_existing (bool, default True):
If
False, return a backtrack_interval of 0 minutes.
Returns
- The backtrack interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
23def get_data( 24 self, 25 select_columns: Optional[List[str]] = None, 26 omit_columns: Optional[List[str]] = None, 27 begin: Union[datetime, int, str, None] = None, 28 end: Union[datetime, int, str, None] = None, 29 params: Optional[Dict[str, Any]] = None, 30 as_iterator: bool = False, 31 as_chunks: bool = False, 32 as_dask: bool = False, 33 add_missing_columns: bool = False, 34 chunk_interval: Union[timedelta, int, None] = None, 35 order: Optional[str] = 'asc', 36 limit: Optional[int] = None, 37 fresh: bool = False, 38 debug: bool = False, 39 **kw: Any 40) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 41 """ 42 Get a pipe's data from the instance connector. 43 44 Parameters 45 ---------- 46 select_columns: Optional[List[str]], default None 47 If provided, only select these given columns. 48 Otherwise select all available columns (i.e. `SELECT *`). 49 50 omit_columns: Optional[List[str]], default None 51 If provided, remove these columns from the selection. 52 53 begin: Union[datetime, int, str, None], default None 54 Lower bound datetime to begin searching for data (inclusive). 55 Translates to a `WHERE` clause like `WHERE datetime >= begin`. 56 Defaults to `None`. 57 58 end: Union[datetime, int, str, None], default None 59 Upper bound datetime to stop searching for data (inclusive). 60 Translates to a `WHERE` clause like `WHERE datetime < end`. 61 Defaults to `None`. 62 63 params: Optional[Dict[str, Any]], default None 64 Filter the retrieved data by a dictionary of parameters. 65 See `meerschaum.utils.sql.build_where` for more details. 66 67 as_iterator: bool, default False 68 If `True`, return a generator of chunks of pipe data. 69 70 as_chunks: bool, default False 71 Alias for `as_iterator`. 72 73 as_dask: bool, default False 74 If `True`, return a `dask.DataFrame` 75 (which may be loaded into a Pandas DataFrame with `df.compute()`). 76 77 add_missing_columns: bool, default False 78 If `True`, add any missing columns from `Pipe.dtypes` to the dataframe. 79 80 chunk_interval: Union[timedelta, int, None], default None 81 If `as_iterator`, then return chunks with `begin` and `end` separated by this interval. 82 This may be set under `pipe.parameters['chunk_minutes']`. 83 By default, use a timedelta of 1440 minutes (1 day). 84 If `chunk_interval` is an integer and the `datetime` axis a timestamp, 85 the use a timedelta with the number of minutes configured to this value. 86 If the `datetime` axis is an integer, default to the configured chunksize. 87 If `chunk_interval` is a `timedelta` and the `datetime` axis an integer, 88 use the number of minutes in the `timedelta`. 89 90 order: Optional[str], default 'asc' 91 If `order` is not `None`, sort the resulting dataframe by indices. 92 93 limit: Optional[int], default None 94 If provided, cap the dataframe to this many rows. 95 96 fresh: bool, default False 97 If `True`, skip local cache and directly query the instance connector. 98 99 debug: bool, default False 100 Verbosity toggle. 101 Defaults to `False`. 102 103 Returns 104 ------- 105 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. 106 107 """ 108 from meerschaum.utils.warnings import warn 109 from meerschaum.utils.venv import Venv 110 from meerschaum.connectors import get_connector_plugin 111 from meerschaum.utils.dtypes import to_pandas_dtype 112 from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator 113 from meerschaum.utils.packages import attempt_import 114 from meerschaum.utils.warnings import dprint 115 dd = attempt_import('dask.dataframe') if as_dask else None 116 dask = attempt_import('dask') if as_dask else None 117 _ = attempt_import('partd', lazy=False) if as_dask else None 118 119 if select_columns == '*': 120 select_columns = None 121 elif isinstance(select_columns, str): 122 select_columns = [select_columns] 123 124 if isinstance(omit_columns, str): 125 omit_columns = [omit_columns] 126 127 begin, end = self.parse_date_bounds(begin, end) 128 as_iterator = as_iterator or as_chunks 129 dt_col = self.columns.get('datetime', None) 130 131 def _sort_df(_df): 132 if df_is_chunk_generator(_df): 133 return _df 134 indices = [] if dt_col not in _df.columns else [dt_col] 135 non_dt_cols = [ 136 col 137 for col_ix, col in self.columns.items() 138 if col_ix != 'datetime' and col in _df.columns 139 ] 140 indices.extend(non_dt_cols) 141 if 'dask' not in _df.__module__: 142 _df.sort_values( 143 by=indices, 144 inplace=True, 145 ascending=(str(order).lower() == 'asc'), 146 ) 147 _df.reset_index(drop=True, inplace=True) 148 else: 149 _df = _df.sort_values( 150 by=indices, 151 ascending=(str(order).lower() == 'asc'), 152 ) 153 _df = _df.reset_index(drop=True) 154 if limit is not None and len(_df) > limit: 155 return _df.head(limit) 156 return _df 157 158 if as_iterator or as_chunks: 159 df = self._get_data_as_iterator( 160 select_columns=select_columns, 161 omit_columns=omit_columns, 162 begin=begin, 163 end=end, 164 params=params, 165 chunk_interval=chunk_interval, 166 limit=limit, 167 order=order, 168 fresh=fresh, 169 debug=debug, 170 ) 171 return _sort_df(df) 172 173 if as_dask: 174 from multiprocessing.pool import ThreadPool 175 dask_pool = ThreadPool(self.get_num_workers()) 176 dask.config.set(pool=dask_pool) 177 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 178 bounds = self.get_chunk_bounds( 179 begin=begin, 180 end=end, 181 bounded=False, 182 chunk_interval=chunk_interval, 183 debug=debug, 184 ) 185 dask_chunks = [ 186 dask.delayed(self.get_data)( 187 select_columns=select_columns, 188 omit_columns=omit_columns, 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 chunk_interval=chunk_interval, 193 order=order, 194 limit=limit, 195 fresh=fresh, 196 add_missing_columns=True, 197 debug=debug, 198 ) 199 for (chunk_begin, chunk_end) in bounds 200 ] 201 dask_meta = { 202 col: to_pandas_dtype(typ) 203 for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items() 204 } 205 if debug: 206 dprint(f"Dask meta:\n{dask_meta}") 207 return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta)) 208 209 if not self.exists(debug=debug): 210 return None 211 212 with Venv(get_connector_plugin(self.instance_connector)): 213 df = self.instance_connector.get_pipe_data( 214 pipe=self, 215 select_columns=select_columns, 216 omit_columns=omit_columns, 217 begin=begin, 218 end=end, 219 params=params, 220 limit=limit, 221 order=order, 222 debug=debug, 223 **kw 224 ) 225 if df is None: 226 return df 227 228 if not select_columns: 229 select_columns = [col for col in df.columns] 230 231 pipe_dtypes = self.get_dtypes(refresh=False, debug=debug) 232 cols_to_omit = [ 233 col 234 for col in df.columns 235 if ( 236 col in (omit_columns or []) 237 or 238 col not in (select_columns or []) 239 ) 240 ] 241 cols_to_add = [ 242 col 243 for col in select_columns 244 if col not in df.columns 245 ] + ([ 246 col 247 for col in pipe_dtypes 248 if col not in df.columns 249 ] if add_missing_columns else []) 250 if cols_to_omit: 251 warn( 252 ( 253 f"Received {len(cols_to_omit)} omitted column" 254 + ('s' if len(cols_to_omit) != 1 else '') 255 + f" for {self}. " 256 + "Consider adding `select_columns` and `omit_columns` support to " 257 + f"'{self.instance_connector.type}' connectors to improve performance." 258 ), 259 stack=False, 260 ) 261 _cols_to_select = [col for col in df.columns if col not in cols_to_omit] 262 df = df[_cols_to_select] 263 264 if cols_to_add: 265 if not add_missing_columns: 266 from meerschaum.utils.misc import items_str 267 warn( 268 f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.", 269 stack=False, 270 ) 271 272 df = add_missing_cols_to_df( 273 df, 274 { 275 col: pipe_dtypes.get(col, 'string') 276 for col in cols_to_add 277 }, 278 ) 279 280 enforced_df = self.enforce_dtypes( 281 df, 282 dtypes=pipe_dtypes, 283 debug=debug, 284 ) 285 286 if order: 287 return _sort_df(enforced_df) 288 return enforced_df
Get a pipe's data from the instance connector.
Parameters
- select_columns (Optional[List[str]], default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
- begin (Union[datetime, int, str, None], default None):
Lower bound datetime to begin searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime >= begin. Defaults toNone. - end (Union[datetime, int, str, None], default None):
Upper bound datetime to stop searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime < end. Defaults toNone. - params (Optional[Dict[str, Any]], default None):
Filter the retrieved data by a dictionary of parameters.
See
meerschaum.utils.sql.build_wherefor more details. - as_iterator (bool, default False):
If
True, return a generator of chunks of pipe data. - as_chunks (bool, default False):
Alias for
as_iterator. - as_dask (bool, default False):
If
True, return adask.DataFrame(which may be loaded into a Pandas DataFrame withdf.compute()). - add_missing_columns (bool, default False):
If
True, add any missing columns fromPipe.dtypesto the dataframe. - chunk_interval (Union[timedelta, int, None], default None):
If
as_iterator, then return chunks withbeginandendseparated by this interval. This may be set underpipe.parameters['chunk_minutes']. By default, use a timedelta of 1440 minutes (1 day). Ifchunk_intervalis an integer and thedatetimeaxis a timestamp, the use a timedelta with the number of minutes configured to this value. If thedatetimeaxis is an integer, default to the configured chunksize. Ifchunk_intervalis atimedeltaand thedatetimeaxis an integer, use the number of minutes in thetimedelta. - order (Optional[str], default 'asc'):
If
orderis notNone, sort the resulting dataframe by indices. - limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
- fresh (bool, default False):
If
True, skip local cache and directly query the instance connector. - debug (bool, default False):
Verbosity toggle.
Defaults to
False.
Returns
- A
pd.DataFramefor the pipe's data corresponding to the provided parameters.
380def get_backtrack_data( 381 self, 382 backtrack_minutes: Optional[int] = None, 383 begin: Union[datetime, int, None] = None, 384 params: Optional[Dict[str, Any]] = None, 385 limit: Optional[int] = None, 386 fresh: bool = False, 387 debug: bool = False, 388 **kw: Any 389) -> Optional['pd.DataFrame']: 390 """ 391 Get the most recent data from the instance connector as a Pandas DataFrame. 392 393 Parameters 394 ---------- 395 backtrack_minutes: Optional[int], default None 396 How many minutes from `begin` to select from. 397 If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`. 398 399 begin: Optional[datetime], default None 400 The starting point to search for data. 401 If begin is `None` (default), use the most recent observed datetime 402 (AKA sync_time). 403 404 ``` 405 E.g. begin = 02:00 406 407 Search this region. Ignore this, even if there's data. 408 / / / / / / / / / | 409 -----|----------|----------|----------|----------|----------| 410 00:00 01:00 02:00 03:00 04:00 05:00 411 412 ``` 413 414 params: Optional[Dict[str, Any]], default None 415 The standard Meerschaum `params` query dictionary. 416 417 limit: Optional[int], default None 418 If provided, cap the number of rows to be returned. 419 420 fresh: bool, default False 421 If `True`, Ignore local cache and pull directly from the instance connector. 422 Only comes into effect if a pipe was created with `cache=True`. 423 424 debug: bool default False 425 Verbosity toggle. 426 427 Returns 428 ------- 429 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data 430 is a convenient way to get a pipe's data "backtracked" from the most recent datetime. 431 """ 432 from meerschaum.utils.warnings import warn 433 from meerschaum.utils.venv import Venv 434 from meerschaum.connectors import get_connector_plugin 435 436 if not self.exists(debug=debug): 437 return None 438 439 begin = self.parse_date_bounds(begin) 440 441 backtrack_interval = self.get_backtrack_interval(debug=debug) 442 if backtrack_minutes is None: 443 backtrack_minutes = ( 444 (backtrack_interval.total_seconds() / 60) 445 if isinstance(backtrack_interval, timedelta) 446 else backtrack_interval 447 ) 448 449 if hasattr(self.instance_connector, 'get_backtrack_data'): 450 with Venv(get_connector_plugin(self.instance_connector)): 451 return self.enforce_dtypes( 452 self.instance_connector.get_backtrack_data( 453 pipe=self, 454 begin=begin, 455 backtrack_minutes=backtrack_minutes, 456 params=params, 457 limit=limit, 458 debug=debug, 459 **kw 460 ), 461 debug=debug, 462 ) 463 464 if begin is None: 465 begin = self.get_sync_time(params=params, debug=debug) 466 467 backtrack_interval = ( 468 timedelta(minutes=backtrack_minutes) 469 if isinstance(begin, datetime) 470 else backtrack_minutes 471 ) 472 if begin is not None: 473 begin = begin - backtrack_interval 474 475 return self.get_data( 476 begin=begin, 477 params=params, 478 debug=debug, 479 limit=limit, 480 order=kw.get('order', 'desc'), 481 **kw 482 )
Get the most recent data from the instance connector as a Pandas DataFrame.
Parameters
- backtrack_minutes (Optional[int], default None):
How many minutes from
beginto select from. IfNone, usepipe.parameters['fetch']['backtrack_minutes']. begin (Optional[datetime], default None): The starting point to search for data. If begin is
None(default), use the most recent observed datetime (AKA sync_time).E.g. begin = 02:00 Search this region. Ignore this, even if there's data. / / / / / / / / / | -----|----------|----------|----------|----------|----------| 00:00 01:00 02:00 03:00 04:00 05:00params (Optional[Dict[str, Any]], default None): The standard Meerschaum
paramsquery dictionary.- limit (Optional[int], default None): If provided, cap the number of rows to be returned.
- fresh (bool, default False):
If
True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created withcache=True. - debug (bool default False): Verbosity toggle.
Returns
- A
pd.DataFramefor the pipe's data corresponding to the provided parameters. Backtrack data - is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
485def get_rowcount( 486 self, 487 begin: Union[datetime, int, None] = None, 488 end: Union[datetime, int, None] = None, 489 params: Optional[Dict[str, Any]] = None, 490 remote: bool = False, 491 debug: bool = False 492) -> int: 493 """ 494 Get a Pipe's instance or remote rowcount. 495 496 Parameters 497 ---------- 498 begin: Optional[datetime], default None 499 Count rows where datetime > begin. 500 501 end: Optional[datetime], default None 502 Count rows where datetime < end. 503 504 remote: bool, default False 505 Count rows from a pipe's remote source. 506 **NOTE**: This is experimental! 507 508 debug: bool, default False 509 Verbosity toggle. 510 511 Returns 512 ------- 513 An `int` of the number of rows in the pipe corresponding to the provided parameters. 514 Returned 0 if the pipe does not exist. 515 """ 516 from meerschaum.utils.warnings import warn 517 from meerschaum.utils.venv import Venv 518 from meerschaum.connectors import get_connector_plugin 519 from meerschaum.utils.misc import filter_keywords 520 521 begin, end = self.parse_date_bounds(begin, end) 522 connector = self.instance_connector if not remote else self.connector 523 try: 524 with Venv(get_connector_plugin(connector)): 525 if not hasattr(connector, 'get_pipe_rowcount'): 526 warn( 527 f"Connectors of type '{connector.type}' " 528 "do not implement `get_pipe_rowcount()`.", 529 stack=False, 530 ) 531 return 0 532 kwargs = filter_keywords( 533 connector.get_pipe_rowcount, 534 begin=begin, 535 end=end, 536 params=params, 537 remote=remote, 538 debug=debug, 539 ) 540 if remote and 'remote' not in kwargs: 541 warn( 542 f"Connectors of type '{connector.type}' do not support remote rowcounts.", 543 stack=False, 544 ) 545 return 0 546 rowcount = connector.get_pipe_rowcount( 547 self, 548 begin=begin, 549 end=end, 550 params=params, 551 remote=remote, 552 debug=debug, 553 ) 554 if rowcount is None: 555 return 0 556 return rowcount 557 except AttributeError as e: 558 warn(e) 559 if remote: 560 return 0 561 warn(f"Failed to get a rowcount for {self}.") 562 return 0
Get a Pipe's instance or remote rowcount.
Parameters
- begin (Optional[datetime], default None): Count rows where datetime > begin.
- end (Optional[datetime], default None): Count rows where datetime < end.
- remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
- debug (bool, default False): Verbosity toggle.
Returns
- An
intof the number of rows in the pipe corresponding to the provided parameters. - Returned 0 if the pipe does not exist.
826def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]: 827 """ 828 Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data(). 829 Keywords arguments are passed to `Pipe.get_data()`. 830 """ 831 from meerschaum.utils.warnings import warn 832 kwargs['limit'] = 1 833 try: 834 result_df = self.get_data(**kwargs) 835 if result_df is None or len(result_df) == 0: 836 return None 837 return result_df.reset_index(drop=True).iloc[0].to_dict() 838 except Exception as e: 839 warn(f"Failed to read value from {self}:\n{e}", stack=False) 840 return None
Convenience function to return a single row as a dictionary (or None) from Pipe.get_data().
Keywords arguments are passed toPipe.get_data()`.
842def get_value( 843 self, 844 column: str, 845 params: Optional[Dict[str, Any]] = None, 846 **kwargs: Any 847) -> Any: 848 """ 849 Convenience function to return a single value (or `None`) from `Pipe.get_data()`. 850 Keywords arguments are passed to `Pipe.get_data()`. 851 """ 852 from meerschaum.utils.warnings import warn 853 kwargs['select_columns'] = [column] 854 kwargs['limit'] = 1 855 try: 856 result_df = self.get_data(params=params, **kwargs) 857 if result_df is None or len(result_df) == 0: 858 return None 859 if column not in result_df.columns: 860 raise ValueError(f"Column '{column}' was not included in the result set.") 861 return result_df[column][0] 862 except Exception as e: 863 warn(f"Failed to read value from {self}:\n{e}", stack=False) 864 return None
Convenience function to return a single value (or None) from Pipe.get_data().
Keywords arguments are passed to Pipe.get_data().
565def get_chunk_interval( 566 self, 567 chunk_interval: Union[timedelta, int, None] = None, 568 debug: bool = False, 569) -> Union[timedelta, int]: 570 """ 571 Get the chunk interval to use for this pipe. 572 573 Parameters 574 ---------- 575 chunk_interval: Union[timedelta, int, None], default None 576 If provided, coerce this value into the correct type. 577 For example, if the datetime axis is an integer, then 578 return the number of minutes. 579 580 Returns 581 ------- 582 The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 583 """ 584 default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes') 585 configured_chunk_minutes = self.parameters.get('verify', {}).get('chunk_minutes', None) 586 chunk_minutes = ( 587 (configured_chunk_minutes or default_chunk_minutes) 588 if chunk_interval is None 589 else ( 590 chunk_interval 591 if isinstance(chunk_interval, int) 592 else int(chunk_interval.total_seconds() / 60) 593 ) 594 ) 595 596 dt_col = self.columns.get('datetime', None) 597 if dt_col is None: 598 return timedelta(minutes=chunk_minutes) 599 600 dt_dtype = self.dtypes.get(dt_col, 'datetime') 601 if 'int' in dt_dtype.lower(): 602 return chunk_minutes 603 return timedelta(minutes=chunk_minutes)
Get the chunk interval to use for this pipe.
Parameters
- chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type. For example, if the datetime axis is an integer, then return the number of minutes.
Returns
- The chunk interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
606def get_chunk_bounds( 607 self, 608 begin: Union[datetime, int, None] = None, 609 end: Union[datetime, int, None] = None, 610 bounded: bool = False, 611 chunk_interval: Union[timedelta, int, None] = None, 612 debug: bool = False, 613) -> List[ 614 Tuple[ 615 Union[datetime, int, None], 616 Union[datetime, int, None], 617 ] 618]: 619 """ 620 Return a list of datetime bounds for iterating over the pipe's `datetime` axis. 621 622 Parameters 623 ---------- 624 begin: Union[datetime, int, None], default None 625 If provided, do not select less than this value. 626 Otherwise the first chunk will be unbounded. 627 628 end: Union[datetime, int, None], default None 629 If provided, do not select greater than or equal to this value. 630 Otherwise the last chunk will be unbounded. 631 632 bounded: bool, default False 633 If `True`, do not include `None` in the first chunk. 634 635 chunk_interval: Union[timedelta, int, None], default None 636 If provided, use this interval for the size of chunk boundaries. 637 The default value for this pipe may be set 638 under `pipe.parameters['verify']['chunk_minutes']`. 639 640 debug: bool, default False 641 Verbosity toggle. 642 643 Returns 644 ------- 645 A list of chunk bounds (datetimes or integers). 646 If unbounded, the first and last chunks will include `None`. 647 """ 648 from datetime import timedelta 649 from meerschaum.utils.dtypes import are_dtypes_equal 650 from meerschaum.utils.misc import interval_str 651 include_less_than_begin = not bounded and begin is None 652 include_greater_than_end = not bounded and end is None 653 if begin is None: 654 begin = self.get_sync_time(newest=False, debug=debug) 655 consolidate_end_chunk = False 656 if end is None: 657 end = self.get_sync_time(newest=True, debug=debug) 658 if end is not None and hasattr(end, 'tzinfo'): 659 end += timedelta(minutes=1) 660 consolidate_end_chunk = True 661 elif are_dtypes_equal(str(type(end)), 'int'): 662 end += 1 663 consolidate_end_chunk = True 664 665 if begin is None and end is None: 666 return [(None, None)] 667 668 begin, end = self.parse_date_bounds(begin, end) 669 670 if begin and end: 671 if begin >= end: 672 return ( 673 [(begin, begin)] 674 if bounded 675 else [(begin, None)] 676 ) 677 if end <= begin: 678 return ( 679 [(end, end)] 680 if bounded 681 else [(None, begin)] 682 ) 683 684 ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`. 685 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 686 687 ### Build a list of tuples containing the chunk boundaries 688 ### so that we can sync multiple chunks in parallel. 689 ### Run `verify pipes --workers 1` to sync chunks in series. 690 chunk_bounds = [] 691 begin_cursor = begin 692 num_chunks = 0 693 max_chunks = 1_000_000 694 while begin_cursor < end: 695 end_cursor = begin_cursor + chunk_interval 696 chunk_bounds.append((begin_cursor, end_cursor)) 697 begin_cursor = end_cursor 698 num_chunks += 1 699 if num_chunks >= max_chunks: 700 raise ValueError( 701 f"Too many chunks of size '{interval_str(chunk_interval)}' " 702 f"between '{begin}' and '{end}'." 703 ) 704 705 if num_chunks > 1 and consolidate_end_chunk: 706 last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2] 707 chunk_bounds = chunk_bounds[:-2] 708 chunk_bounds.append((second_last_bounds[0], last_bounds[1])) 709 710 ### The chunk interval might be too large. 711 if not chunk_bounds and end >= begin: 712 chunk_bounds = [(begin, end)] 713 714 ### Truncate the last chunk to the end timestamp. 715 if chunk_bounds[-1][1] > end: 716 chunk_bounds[-1] = (chunk_bounds[-1][0], end) 717 718 ### Pop the last chunk if its bounds are equal. 719 if chunk_bounds[-1][0] == chunk_bounds[-1][1]: 720 chunk_bounds = chunk_bounds[:-1] 721 722 if include_less_than_begin: 723 chunk_bounds = [(None, begin)] + chunk_bounds 724 if include_greater_than_end: 725 chunk_bounds = chunk_bounds + [(end, None)] 726 727 return chunk_bounds
Return a list of datetime bounds for iterating over the pipe's datetime axis.
Parameters
- begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
- end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
- bounded (bool, default False):
If
True, do not includeNonein the first chunk. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this interval for the size of chunk boundaries.
The default value for this pipe may be set
under
pipe.parameters['verify']['chunk_minutes']. - debug (bool, default False): Verbosity toggle.
Returns
- A list of chunk bounds (datetimes or integers).
- If unbounded, the first and last chunks will include
None.
730def get_chunk_bounds_batches( 731 self, 732 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]], 733 batchsize: Optional[int] = None, 734 workers: Optional[int] = None, 735 debug: bool = False, 736) -> List[ 737 Tuple[ 738 Tuple[ 739 Union[datetime, int, None], 740 Union[datetime, int, None], 741 ], ... 742 ] 743]: 744 """ 745 Return a list of tuples of chunk bounds of size `batchsize`. 746 747 Parameters 748 ---------- 749 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]] 750 A list of chunk_bounds (see `Pipe.get_chunk_bounds()`). 751 752 batchsize: Optional[int], default None 753 How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`. 754 755 workers: Optional[int], default None 756 If `batchsize` is `None`, use this as the desired number of workers. 757 Passed to `Pipe.get_num_workers()`. 758 759 Returns 760 ------- 761 A list of tuples of chunk bound tuples. 762 """ 763 from meerschaum.utils.misc import iterate_chunks 764 765 if batchsize is None: 766 batchsize = self.get_num_workers(workers=workers) 767 768 return [ 769 tuple( 770 _batch_chunk_bounds 771 for _batch_chunk_bounds in batch 772 if _batch_chunk_bounds is not None 773 ) 774 for batch in iterate_chunks(chunk_bounds, batchsize) 775 if batch 776 ]
Return a list of tuples of chunk bounds of size batchsize.
Parameters
- chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]):
A list of chunk_bounds (see
Pipe.get_chunk_bounds()). - batchsize (Optional[int], default None):
How many chunks to include in a batch. Defaults to
Pipe.get_num_workers(). - workers (Optional[int], default None):
If
batchsizeisNone, use this as the desired number of workers. Passed toPipe.get_num_workers().
Returns
- A list of tuples of chunk bound tuples.
779def parse_date_bounds(self, *dt_vals: Union[datetime, int, None]) -> Union[ 780 datetime, 781 int, 782 str, 783 None, 784 Tuple[Union[datetime, int, str, None]] 785]: 786 """ 787 Given a date bound (begin, end), coerce a timezone if necessary. 788 """ 789 from meerschaum.utils.misc import is_int 790 from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES 791 from meerschaum.utils.warnings import warn 792 dateutil_parser = mrsm.attempt_import('dateutil.parser') 793 794 def _parse_date_bound(dt_val): 795 if dt_val is None: 796 return None 797 798 if isinstance(dt_val, int): 799 return dt_val 800 801 if dt_val == '': 802 return '' 803 804 if is_int(dt_val): 805 return int(dt_val) 806 807 if isinstance(dt_val, str): 808 try: 809 dt_val = dateutil_parser.parse(dt_val) 810 except Exception as e: 811 warn(f"Could not parse '{dt_val}' as datetime:\n{e}") 812 return None 813 814 dt_col = self.columns.get('datetime', None) 815 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) 816 if dt_typ == 'datetime': 817 dt_typ = MRSM_PD_DTYPES['datetime'] 818 return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower())) 819 820 bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals) 821 if len(bounds) == 1: 822 return bounds[0] 823 return bounds
Given a date bound (begin, end), coerce a timezone if necessary.
12def register( 13 self, 14 debug: bool = False, 15 **kw: Any 16) -> SuccessTuple: 17 """ 18 Register a new Pipe along with its attributes. 19 20 Parameters 21 ---------- 22 debug: bool, default False 23 Verbosity toggle. 24 25 kw: Any 26 Keyword arguments to pass to `instance_connector.register_pipe()`. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 """ 32 if self.temporary: 33 return False, "Cannot register pipes created with `temporary=True` (read-only)." 34 35 from meerschaum.utils.formatting import get_console 36 from meerschaum.utils.venv import Venv 37 from meerschaum.connectors import get_connector_plugin, custom_types 38 from meerschaum.config._patch import apply_patch_to_config 39 40 import warnings 41 with warnings.catch_warnings(): 42 warnings.simplefilter('ignore') 43 try: 44 _conn = self.connector 45 except Exception: 46 _conn = None 47 48 if isinstance(_conn, str): 49 _conn = None 50 51 if ( 52 _conn is not None 53 and 54 (_conn.type == 'plugin' or _conn.type in custom_types) 55 and 56 getattr(_conn, 'register', None) is not None 57 ): 58 try: 59 with Venv(get_connector_plugin(_conn), debug=debug): 60 params = self.connector.register(self) 61 except Exception: 62 get_console().print_exception() 63 params = None 64 params = {} if params is None else params 65 if not isinstance(params, dict): 66 from meerschaum.utils.warnings import warn 67 warn( 68 f"Invalid parameters returned from `register()` in connector {self.connector}:\n" 69 + f"{params}" 70 ) 71 else: 72 self.parameters = apply_patch_to_config(params, self.parameters) 73 74 if not self.parameters: 75 cols = self.columns if self.columns else {'datetime': None, 'id': None} 76 self.parameters = { 77 'columns': cols, 78 } 79 80 with Venv(get_connector_plugin(self.instance_connector)): 81 return self.instance_connector.register_pipe(self, debug=debug, **kw)
Register a new Pipe along with its attributes.
Parameters
- debug (bool, default False): Verbosity toggle.
- kw (Any):
Keyword arguments to pass to
instance_connector.register_pipe().
Returns
- A
SuccessTupleof success, message.
20@property 21def attributes(self) -> Dict[str, Any]: 22 """ 23 Return a dictionary of a pipe's keys and parameters. 24 These values are reflected directly from the pipes table of the instance. 25 """ 26 from meerschaum.config import get_config 27 from meerschaum.config._patch import apply_patch_to_config 28 from meerschaum.utils.venv import Venv 29 from meerschaum.connectors import get_connector_plugin 30 from meerschaum.utils.dtypes import get_current_timestamp 31 32 timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds') 33 34 now = get_current_timestamp('ms', as_int=True) / 1000 35 _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug) 36 timed_out = ( 37 _attributes_sync_time is None 38 or 39 (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds) 40 ) 41 if not self.temporary and timed_out: 42 self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug) 43 local_attributes = self._get_cached_value('attributes', debug=self.debug) or {} 44 with Venv(get_connector_plugin(self.instance_connector)): 45 instance_attributes = self.instance_connector.get_pipe_attributes(self) 46 47 self._cache_value( 48 'attributes', 49 apply_patch_to_config(instance_attributes, local_attributes), 50 memory_only=True, 51 debug=self.debug, 52 ) 53 54 return self._attributes
Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.
147@property 148def parameters(self) -> Optional[Dict[str, Any]]: 149 """ 150 Return the parameters dictionary of the pipe. 151 """ 152 return self.get_parameters(debug=self.debug)
Return the parameters dictionary of the pipe.
164@property 165def columns(self) -> Union[Dict[str, str], None]: 166 """ 167 Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`. 168 """ 169 cols = self.parameters.get('columns', {}) 170 if not isinstance(cols, dict): 171 return {} 172 return {col_ix: col for col_ix, col in cols.items() if col and col_ix}
Return the columns dictionary defined in meerschaum.Pipe.parameters.
189@property 190def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]: 191 """ 192 Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`. 193 """ 194 _parameters = self.get_parameters(debug=self.debug) 195 indices_key = ( 196 'indexes' 197 if 'indexes' in _parameters 198 else 'indices' 199 ) 200 201 _indices = _parameters.get(indices_key, {}) 202 _columns = self.columns 203 dt_col = _columns.get('datetime', None) 204 if not isinstance(_indices, dict): 205 _indices = {} 206 unique_cols = list(set(( 207 [dt_col] 208 if dt_col 209 else [] 210 ) + [ 211 col 212 for col_ix, col in _columns.items() 213 if col and col_ix != 'datetime' 214 ])) 215 return { 216 **({'unique': unique_cols} if len(unique_cols) > 1 else {}), 217 **{col_ix: col for col_ix, col in _columns.items() if col}, 218 **_indices 219 }
Return the indices dictionary defined in meerschaum.Pipe.parameters.
222@property 223def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]: 224 """ 225 Alias for `meerschaum.Pipe.indices`. 226 """ 227 return self.indices
Alias for meerschaum.Pipe.indices.
278@property 279def dtypes(self) -> Dict[str, Any]: 280 """ 281 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 282 """ 283 return self.get_dtypes(refresh=False, debug=self.debug)
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
382@property 383def autoincrement(self) -> bool: 384 """ 385 Return the `autoincrement` parameter for the pipe. 386 """ 387 return self.parameters.get('autoincrement', False)
Return the autoincrement parameter for the pipe.
398@property 399def autotime(self) -> bool: 400 """ 401 Return the `autotime` parameter for the pipe. 402 """ 403 return self.parameters.get('autotime', False)
Return the autotime parameter for the pipe.
349@property 350def upsert(self) -> bool: 351 """ 352 Return whether `upsert` is set for the pipe. 353 """ 354 return self.parameters.get('upsert', False)
Return whether upsert is set for the pipe.
365@property 366def static(self) -> bool: 367 """ 368 Return whether `static` is set for the pipe. 369 """ 370 return self.parameters.get('static', False)
Return whether static is set for the pipe.
414@property 415def tzinfo(self) -> Union[None, timezone]: 416 """ 417 Return `timezone.utc` if the pipe is timezone-aware. 418 """ 419 _tzinfo = self._get_cached_value('tzinfo', debug=self.debug) 420 if _tzinfo is not None: 421 return _tzinfo if _tzinfo != 'None' else None 422 423 _tzinfo = None 424 dt_col = self.columns.get('datetime', None) 425 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None 426 if self.autotime: 427 ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 428 ts_typ = self.dtypes.get(ts_col, 'datetime') 429 dt_typ = ts_typ 430 431 if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime': 432 _tzinfo = timezone.utc 433 434 self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug) 435 return _tzinfo
Return timezone.utc if the pipe is timezone-aware.
438@property 439def enforce(self) -> bool: 440 """ 441 Return the `enforce` parameter for the pipe. 442 """ 443 return self.parameters.get('enforce', True)
Return the enforce parameter for the pipe.
454@property 455def null_indices(self) -> bool: 456 """ 457 Return the `null_indices` parameter for the pipe. 458 """ 459 return self.parameters.get('null_indices', True)
Return the null_indices parameter for the pipe.
470@property 471def mixed_numerics(self) -> bool: 472 """ 473 Return the `mixed_numerics` parameter for the pipe. 474 """ 475 return self.parameters.get('mixed_numerics', True)
Return the mixed_numerics parameter for the pipe.
486def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]: 487 """ 488 Check if the requested columns are defined. 489 490 Parameters 491 ---------- 492 *args: str 493 The column names to be retrieved. 494 495 error: bool, default False 496 If `True`, raise an `Exception` if the specified column is not defined. 497 498 Returns 499 ------- 500 A tuple of the same size of `args` or a `str` if `args` is a single argument. 501 502 Examples 503 -------- 504 >>> pipe = mrsm.Pipe('test', 'test') 505 >>> pipe.columns = {'datetime': 'dt', 'id': 'id'} 506 >>> pipe.get_columns('datetime', 'id') 507 ('dt', 'id') 508 >>> pipe.get_columns('value', error=True) 509 Exception: 🛑 Missing 'value' column for Pipe('test', 'test'). 510 """ 511 from meerschaum.utils.warnings import error as _error 512 if not args: 513 args = tuple(self.columns.keys()) 514 col_names = [] 515 for col in args: 516 col_name = None 517 try: 518 col_name = self.columns[col] 519 if col_name is None and error: 520 _error(f"Please define the name of the '{col}' column for {self}.") 521 except Exception: 522 col_name = None 523 if col_name is None and error: 524 _error(f"Missing '{col}'" + f" column for {self}.") 525 col_names.append(col_name) 526 if len(col_names) == 1: 527 return col_names[0] 528 return tuple(col_names)
Check if the requested columns are defined.
Parameters
- *args (str): The column names to be retrieved.
- error (bool, default False):
If
True, raise anExceptionif the specified column is not defined.
Returns
- A tuple of the same size of
argsor astrifargsis a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception: 🛑 Missing 'value' column for Pipe('test', 'test').
531def get_columns_types( 532 self, 533 refresh: bool = False, 534 debug: bool = False, 535) -> Union[Dict[str, str], None]: 536 """ 537 Get a dictionary of a pipe's column names and their types. 538 539 Parameters 540 ---------- 541 refresh: bool, default False 542 If `True`, invalidate the cache and fetch directly from the instance connector. 543 544 debug: bool, default False: 545 Verbosity toggle. 546 547 Returns 548 ------- 549 A dictionary of column names (`str`) to column types (`str`). 550 551 Examples 552 -------- 553 >>> pipe.get_columns_types() 554 { 555 'dt': 'TIMESTAMP WITH TIMEZONE', 556 'id': 'BIGINT', 557 'val': 'DOUBLE PRECISION', 558 } 559 >>> 560 """ 561 from meerschaum.connectors import get_connector_plugin 562 from meerschaum.utils.dtypes import get_current_timestamp 563 564 now = get_current_timestamp('ms', as_int=True) / 1000 565 cache_seconds = ( 566 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 567 if self.static 568 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 569 ) 570 if refresh: 571 self._clear_cache_key('_columns_types_timestamp', debug=debug) 572 self._clear_cache_key('_columns_types', debug=debug) 573 574 _columns_types = self._get_cached_value('_columns_types', debug=debug) 575 if _columns_types: 576 columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug) 577 if columns_types_timestamp is not None: 578 delta = now - columns_types_timestamp 579 if delta < cache_seconds: 580 if debug: 581 dprint( 582 f"Returning cached `columns_types` for {self} " 583 f"({round(delta, 2)} seconds old)." 584 ) 585 return _columns_types 586 587 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 588 _columns_types = ( 589 self.instance_connector.get_pipe_columns_types(self, debug=debug) 590 if hasattr(self.instance_connector, 'get_pipe_columns_types') 591 else None 592 ) 593 594 self._cache_value('_columns_types', _columns_types, debug=debug) 595 self._cache_value('_columns_types_timestamp', now, debug=debug) 596 return _columns_types or {}
Get a dictionary of a pipe's column names and their types.
Parameters
- refresh (bool, default False):
If
True, invalidate the cache and fetch directly from the instance connector. - debug (bool, default False:): Verbosity toggle.
Returns
- A dictionary of column names (
str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
'dt': 'TIMESTAMP WITH TIMEZONE',
'id': 'BIGINT',
'val': 'DOUBLE PRECISION',
}
>>>
599def get_columns_indices( 600 self, 601 debug: bool = False, 602 refresh: bool = False, 603) -> Dict[str, List[Dict[str, str]]]: 604 """ 605 Return a dictionary mapping columns to index information. 606 """ 607 from meerschaum.connectors import get_connector_plugin 608 from meerschaum.utils.dtypes import get_current_timestamp 609 610 now = get_current_timestamp('ms', as_int=True) / 1000 611 cache_seconds = ( 612 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 613 if self.static 614 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 615 ) 616 if refresh: 617 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 618 self._clear_cache_key('_columns_indices', debug=debug) 619 620 _columns_indices = self._get_cached_value('_columns_indices', debug=debug) 621 622 if _columns_indices: 623 columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug) 624 if columns_indices_timestamp is not None: 625 delta = now - columns_indices_timestamp 626 if delta < cache_seconds: 627 if debug: 628 dprint( 629 f"Returning cached `columns_indices` for {self} " 630 f"({round(delta, 2)} seconds old)." 631 ) 632 return _columns_indices 633 634 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 635 _columns_indices = ( 636 self.instance_connector.get_pipe_columns_indices(self, debug=debug) 637 if hasattr(self.instance_connector, 'get_pipe_columns_indices') 638 else None 639 ) 640 641 self._cache_value('_columns_indices', _columns_indices, debug=debug) 642 self._cache_value('_columns_indices_timestamp', now, debug=debug) 643 return {k: v for k, v in _columns_indices.items() if k and v} or {}
Return a dictionary mapping columns to index information.
1043def get_indices(self) -> Dict[str, str]: 1044 """ 1045 Return a dictionary mapping index keys to their names in the database. 1046 1047 Returns 1048 ------- 1049 A dictionary of index keys to index names. 1050 """ 1051 from meerschaum.connectors import get_connector_plugin 1052 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 1053 if hasattr(self.instance_connector, 'get_pipe_index_names'): 1054 result = self.instance_connector.get_pipe_index_names(self) 1055 else: 1056 result = {} 1057 1058 return result
Return a dictionary mapping index keys to their names in the database.
Returns
- A dictionary of index keys to index names.
59def get_parameters( 60 self, 61 apply_symlinks: bool = True, 62 refresh: bool = False, 63 debug: bool = False, 64 _visited: 'Optional[set[mrsm.Pipe]]' = None, 65) -> Dict[str, Any]: 66 """ 67 Return the `parameters` dictionary of the pipe. 68 69 Parameters 70 ---------- 71 apply_symlinks: bool, default True 72 If `True`, resolve references to parameters from other pipes. 73 74 refresh: bool, default False 75 If `True`, pull the latest attributes for the pipe. 76 77 Returns 78 ------- 79 The pipe's parameters dictionary. 80 """ 81 from meerschaum.config._patch import apply_patch_to_config 82 from meerschaum.config._read_config import search_and_substitute_config 83 84 if _visited is None: 85 _visited = {self} 86 87 if refresh: 88 _ = self._invalidate_cache(hard=True) 89 90 raw_parameters = self.attributes.get('parameters', {}) 91 if not apply_symlinks: 92 return raw_parameters 93 94 parameters = {} 95 for ref_pipe in self.references: 96 try: 97 if ref_pipe in _visited: 98 warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.") 99 return search_and_substitute_config(raw_parameters) 100 101 _visited.add(ref_pipe) 102 if refresh: 103 _ = _cached_base_params.pop(ref_pipe, None) 104 base_params = _cached_base_params.get(ref_pipe, None) 105 if base_params is None: 106 base_params = ref_pipe.get_parameters( 107 apply_symlinks=apply_symlinks, 108 _visited=_visited, 109 debug=debug, 110 ) 111 _cached_base_params[ref_pipe] = base_params 112 if debug: 113 dprint(f"base_params from {ref_pipe} for {self}:") 114 mrsm.pprint(base_params) 115 else: 116 if debug: 117 dprint(f"Using cached base_params from {ref_pipe} for {self}") 118 except Exception as e: 119 warn(f"Failed to resolve reference pipe for {self}: {e}") 120 base_params = {} 121 122 parameters = apply_patch_to_config(parameters, base_params) 123 124 parameters = apply_patch_to_config(parameters, raw_parameters) 125 126 from meerschaum.utils.pipes import replace_pipes_syntax 127 self._symlinks = {} 128 129 def recursive_replace(obj: Any, path: tuple) -> Any: 130 if isinstance(obj, dict): 131 return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()} 132 if isinstance(obj, list): 133 return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)] 134 if isinstance(obj, str): 135 substituted_val = replace_pipes_syntax(obj, _pipe=self) 136 if substituted_val != obj: 137 self._symlinks[path] = { 138 'original': obj, 139 'substituted': substituted_val, 140 } 141 return substituted_val 142 return obj 143 144 return search_and_substitute_config(recursive_replace(parameters, tuple()))
Return the parameters dictionary of the pipe.
Parameters
- apply_symlinks (bool, default True):
If
True, resolve references to parameters from other pipes. - refresh (bool, default False):
If
True, pull the latest attributes for the pipe.
Returns
- The pipe's parameters dictionary.
297def get_dtypes( 298 self, 299 infer: bool = True, 300 refresh: bool = False, 301 debug: bool = False, 302) -> Dict[str, Any]: 303 """ 304 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 305 306 Parameters 307 ---------- 308 infer: bool, default True 309 If `True`, include the implicit existing dtypes. 310 Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`). 311 312 refresh: bool, default False 313 If `True`, invalidate any cache and return the latest known dtypes. 314 315 Returns 316 ------- 317 A dictionary mapping column names to dtypes. 318 """ 319 from meerschaum.config._patch import apply_patch_to_config 320 from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES 321 parameters = self.get_parameters(refresh=refresh, debug=debug) 322 configured_dtypes = parameters.get('dtypes', {}) 323 if debug: 324 dprint(f"Configured dtypes for {self}:") 325 mrsm.pprint(configured_dtypes) 326 327 remote_dtypes = ( 328 self.infer_dtypes(persist=False, refresh=refresh, debug=debug) 329 if infer 330 else {} 331 ) 332 patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {})) 333 334 dt_col = parameters.get('columns', {}).get('datetime', None) 335 primary_col = parameters.get('columns', {}).get('primary', None) 336 _dtypes = { 337 col: MRSM_ALIAS_DTYPES.get(typ, typ) 338 for col, typ in patched_dtypes.items() 339 if col and typ 340 } 341 if dt_col and dt_col not in configured_dtypes: 342 _dtypes[dt_col] = 'datetime' 343 if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes: 344 _dtypes[primary_col] = 'int' 345 346 return _dtypes
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
Parameters
- infer (bool, default True):
If
True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g.Pipe.parameters['dtypes']). - refresh (bool, default False):
If
True, invalidate any cache and return the latest known dtypes.
Returns
- A dictionary mapping column names to dtypes.
1061def update_parameters( 1062 self, 1063 parameters_patch: Dict[str, Any], 1064 persist: bool = True, 1065 debug: bool = False, 1066) -> mrsm.SuccessTuple: 1067 """ 1068 Apply a patch to a pipe's `parameters` dictionary. 1069 1070 Parameters 1071 ---------- 1072 parameters_patch: Dict[str, Any] 1073 The patch to be applied to `Pipe.parameters`. 1074 1075 persist: bool, default True 1076 If `True`, call `Pipe.edit()` to persist the new parameters. 1077 """ 1078 from meerschaum.config import apply_patch_to_config 1079 if 'parameters' not in self._attributes: 1080 self._attributes['parameters'] = {} 1081 1082 self._attributes['parameters'] = apply_patch_to_config( 1083 self._attributes['parameters'], 1084 parameters_patch, 1085 ) 1086 1087 if self.temporary: 1088 persist = False 1089 1090 if not persist: 1091 return True, "Success" 1092 1093 return self.edit(debug=debug)
Apply a patch to a pipe's parameters dictionary.
Parameters
- parameters_patch (Dict[str, Any]):
The patch to be applied to
Pipe.parameters. - persist (bool, default True):
If
True, callPipe.edit()to persist the new parameters.
646def get_id(self, **kw: Any) -> Union[int, str, None]: 647 """ 648 Fetch a pipe's ID from its instance connector. 649 If the pipe is not registered, return `None`. 650 """ 651 if self.temporary: 652 return None 653 654 from meerschaum.utils.venv import Venv 655 from meerschaum.connectors import get_connector_plugin 656 657 with Venv(get_connector_plugin(self.instance_connector)): 658 if hasattr(self.instance_connector, 'get_pipe_id'): 659 return self.instance_connector.get_pipe_id(self, **kw) 660 661 return None
Fetch a pipe's ID from its instance connector.
If the pipe is not registered, return None.
664@property 665def id(self) -> Union[int, str, uuid.UUID, None]: 666 """ 667 Fetch and cache a pipe's ID. 668 """ 669 _id = self._get_cached_value('_id', debug=self.debug) 670 if not _id: 671 _id = self.get_id(debug=self.debug) 672 if _id is not None: 673 self._cache_value('_id', _id, debug=self.debug) 674 return _id
Fetch and cache a pipe's ID.
677def get_val_column(self, debug: bool = False) -> Union[str, None]: 678 """ 679 Return the name of the value column if it's defined, otherwise make an educated guess. 680 If not set in the `columns` dictionary, return the first numeric column that is not 681 an ID or datetime column. 682 If none may be found, return `None`. 683 684 Parameters 685 ---------- 686 debug: bool, default False: 687 Verbosity toggle. 688 689 Returns 690 ------- 691 Either a string or `None`. 692 """ 693 if debug: 694 dprint('Attempting to determine the value column...') 695 try: 696 val_name = self.get_columns('value') 697 except Exception: 698 val_name = None 699 if val_name is not None: 700 if debug: 701 dprint(f"Value column: {val_name}") 702 return val_name 703 704 cols = self.columns 705 if cols is None: 706 if debug: 707 dprint('No columns could be determined. Returning...') 708 return None 709 try: 710 dt_name = self.get_columns('datetime', error=False) 711 except Exception: 712 dt_name = None 713 try: 714 id_name = self.get_columns('id', errors=False) 715 except Exception: 716 id_name = None 717 718 if debug: 719 dprint(f"dt_name: {dt_name}") 720 dprint(f"id_name: {id_name}") 721 722 cols_types = self.get_columns_types(debug=debug) 723 if cols_types is None: 724 return None 725 if debug: 726 dprint(f"cols_types: {cols_types}") 727 if dt_name is not None: 728 cols_types.pop(dt_name, None) 729 if id_name is not None: 730 cols_types.pop(id_name, None) 731 732 candidates = [] 733 candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',} 734 for search_term in candidate_keywords: 735 for col, typ in cols_types.items(): 736 if search_term in typ.lower(): 737 candidates.append(col) 738 break 739 if not candidates: 740 if debug: 741 dprint("No value column could be determined.") 742 return None 743 744 return candidates[0]
Return the name of the value column if it's defined, otherwise make an educated guess.
If not set in the columns dictionary, return the first numeric column that is not
an ID or datetime column.
If none may be found, return None.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- Either a string or
None.
747@property 748def parents(self) -> List[mrsm.Pipe]: 749 """ 750 Return a list of `meerschaum.Pipe` objects to be designated as parents. 751 """ 752 _cached_parents = self.__dict__.get('_parents', None) 753 if _cached_parents is not None: 754 return _cached_parents 755 756 from meerschaum.utils.pipes import get_pipe_from_value 757 base_params = self.get_parameters() 758 key = 'parents' if 'parents' in base_params else 'parent' 759 parents_refs = base_params.get(key, None) or [] 760 if isinstance(parents_refs, str) or isinstance(parents_refs, dict): 761 parents_refs = [parents_refs] 762 763 if not parents_refs: 764 return [] 765 766 self._parents = [get_pipe_from_value(val, _pipe=self) for val in parents_refs] 767 return self._parents
Return a list of meerschaum.Pipe objects to be designated as parents.
770@property 771def parent(self) -> Union[mrsm.Pipe, None]: 772 """ 773 Return the first pipe in `self.parents` or `None`. 774 """ 775 _parents = self.parents 776 if not _parents: 777 return None 778 779 return _parents[0]
Return the first pipe in self.parents or None.
815@property 816def children(self) -> List[mrsm.Pipe]: 817 """ 818 Return a list of `meerschaum.Pipe` objects to be designated as children. 819 """ 820 _cached_children = self.__dict__.get('_children', None) 821 if _cached_children is not None: 822 return _cached_children 823 824 from meerschaum.utils.pipes import get_pipe_from_value 825 base_params = self.get_parameters() 826 key = 'children' if 'children' in base_params else 'child' 827 children_refs = base_params.get(key, None) or [] 828 if isinstance(children_refs, str) or isinstance(children_refs, dict): 829 children_refs = [children_refs] 830 831 if not children_refs: 832 return [] 833 834 self._children = [get_pipe_from_value(val, _pipe=self) for val in children_refs] 835 return self._children
Return a list of meerschaum.Pipe objects to be designated as children.
838@property 839def child(self) -> mrsm.Pipe | None: 840 """ 841 Return the first pipe in `self.children` or None. 842 """ 843 _children = self.children 844 if not _children: 845 return None 846 847 return _children[0]
Return the first pipe in self.children or None.
907@property 908def reference(self) -> mrsm.Pipe | None: 909 """ 910 Return the first pipe in `self.references` or None. 911 """ 912 _references = self.references 913 if not _references: 914 return None 915 916 return _references[0]
Return the first pipe in self.references or None.
884@property 885def references(self) -> List[mrsm.Pipe]: 886 """ 887 Return a list of `meerschaum.Pipe` objects to be designated as references. 888 """ 889 _cached_references = self.__dict__.get('_references', None) 890 if _cached_references is not None: 891 return _cached_references 892 893 from meerschaum.utils.pipes import get_pipe_from_value 894 base_params = self.get_parameters(apply_symlinks=False) 895 key = 'references' if 'references' in base_params else 'reference' 896 refs = base_params.get(key, None) or [] 897 if isinstance(refs, str) or isinstance(refs, dict): 898 refs = [refs] 899 900 if not refs: 901 return [] 902 903 self._refs = [get_pipe_from_value(val, _pipe=self) for val in refs] 904 return self._refs
Return a list of meerschaum.Pipe objects to be designated as references.
954@property 955def target(self) -> str: 956 """ 957 The target table name. 958 You can set the target name under on of the following keys 959 (checked in this order): 960 - `target` 961 - `target_name` 962 - `target_table` 963 - `target_table_name` 964 """ 965 target_val = self.parameters.get('target', None) 966 if not target_val: 967 default_target = self._target_legacy() 968 default_targets = {default_target} 969 potential_keys = ('target_name', 'target_table', 'target_table_name') 970 _target = None 971 for k in potential_keys: 972 if k in self.parameters: 973 _target = self.parameters[k] 974 break 975 976 _target = _target or default_target 977 978 if self.instance_connector.type == 'sql': 979 from meerschaum.utils.sql import truncate_item_name 980 truncated_target = truncate_item_name(_target, self.instance_connector.flavor) 981 default_targets.add(truncated_target) 982 warned_target = self.__dict__.get('_warned_target', False) 983 if truncated_target != _target and not warned_target: 984 if not warned_target: 985 warn( 986 f"The target '{_target}' is too long for '{self.instance_connector.flavor}', " 987 + f"will use {truncated_target} instead." 988 ) 989 self.__dict__['_warned_target'] = True 990 _target = truncated_target 991 992 if _target in default_targets: 993 return _target 994 995 self.target = _target 996 return _target 997 998 return target_val
The target table name. You can set the target name under on of the following keys (checked in this order):
targettarget_nametarget_tabletarget_table_name
1021def guess_datetime(self) -> Union[str, None]: 1022 """ 1023 Try to determine a pipe's datetime column. 1024 """ 1025 _dtypes = self.dtypes 1026 1027 ### Abort if the user explictly disallows a datetime index. 1028 if 'datetime' in _dtypes: 1029 if _dtypes['datetime'] is None: 1030 return None 1031 1032 from meerschaum.utils.dtypes import are_dtypes_equal 1033 dt_cols = [ 1034 col 1035 for col, typ in _dtypes.items() 1036 if are_dtypes_equal(typ, 'datetime') 1037 ] 1038 if not dt_cols: 1039 return None 1040 return dt_cols[0]
Try to determine a pipe's datetime column.
1185@property 1186def precision(self) -> Dict[str, Union[str, int]]: 1187 """ 1188 Return the configured or detected precision. 1189 """ 1190 return self.get_precision(debug=self.debug)
Return the configured or detected precision.
1096def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]: 1097 """ 1098 Return the timestamp precision unit and interval for the `datetime` axis. 1099 """ 1100 from meerschaum.utils.dtypes import ( 1101 MRSM_PRECISION_UNITS_SCALARS, 1102 MRSM_PRECISION_UNITS_ALIASES, 1103 MRSM_PD_DTYPES, 1104 are_dtypes_equal, 1105 ) 1106 from meerschaum._internal.static import STATIC_CONFIG 1107 1108 _precision = self._get_cached_value('precision', debug=debug) 1109 if _precision: 1110 if debug: 1111 dprint(f"Returning cached precision: {_precision}") 1112 return _precision 1113 1114 parameters = self.parameters 1115 _precision = parameters.get('precision', {}) 1116 if isinstance(_precision, str): 1117 _precision = {'unit': _precision} 1118 default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 1119 1120 if not _precision: 1121 1122 dt_col = parameters.get('columns', {}).get('datetime', None) 1123 if not dt_col and self.autotime: 1124 dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 1125 if not dt_col: 1126 if debug: 1127 dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.") 1128 return {'unit': default_precision_unit} 1129 1130 dt_typ = self.dtypes.get(dt_col, 'datetime') 1131 if are_dtypes_equal(dt_typ, 'datetime'): 1132 if dt_typ == 'datetime': 1133 dt_typ = MRSM_PD_DTYPES['datetime'] 1134 if debug: 1135 dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.") 1136 1137 _precision = { 1138 'unit': ( 1139 dt_typ 1140 .split('[', maxsplit=1)[-1] 1141 .split(',', maxsplit=1)[0] 1142 .split(' ', maxsplit=1)[0] 1143 ).rstrip(']') 1144 } 1145 1146 if debug: 1147 dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.") 1148 1149 elif are_dtypes_equal(dt_typ, 'int'): 1150 _precision = { 1151 'unit': ( 1152 'second' 1153 if '32' in dt_typ 1154 else default_precision_unit 1155 ) 1156 } 1157 elif are_dtypes_equal(dt_typ, 'date'): 1158 if debug: 1159 dprint("Datetime axis is 'date', falling back to 'day' precision.") 1160 _precision = {'unit': 'day'} 1161 1162 precision_unit = _precision.get('unit', default_precision_unit) 1163 precision_interval = _precision.get('interval', None) 1164 true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 1165 if true_precision_unit is None: 1166 if debug: 1167 dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.") 1168 true_precision_unit = default_precision_unit 1169 1170 if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS: 1171 from meerschaum.utils.misc import items_str 1172 raise ValueError( 1173 f"Invalid precision unit '{true_precision_unit}'.\n" 1174 "Accepted values are " 1175 f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}." 1176 ) 1177 1178 _precision = {'unit': true_precision_unit} 1179 if precision_interval: 1180 _precision['interval'] = precision_interval 1181 self._cache_value('precision', _precision, debug=debug) 1182 return self._precision
Return the timestamp precision unit and interval for the datetime axis.
12def show( 13 self, 14 nopretty: bool = False, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Show attributes of a Pipe. 20 21 Parameters 22 ---------- 23 nopretty: bool, default False 24 If `True`, simply print the JSON of the pipe's attributes. 25 26 debug: bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success, message. 32 33 """ 34 import json 35 from meerschaum.utils.formatting import ( 36 pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console, 37 ) 38 from meerschaum.utils.packages import import_rich, attempt_import 39 from meerschaum.utils.warnings import info 40 attributes_json = json.dumps(self.attributes) 41 if not nopretty: 42 _to_print = f"Attributes for {self}:" 43 if ANSI: 44 _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta') 45 print(_to_print) 46 rich = import_rich() 47 rich_json = attempt_import('rich.json') 48 get_console().print(rich_json.JSON(attributes_json)) 49 else: 50 print(_to_print) 51 else: 52 print(attributes_json) 53 54 return True, "Success"
Show attributes of a Pipe.
Parameters
- nopretty (bool, default False):
If
True, simply print the JSON of the pipe's attributes. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
21def edit( 22 self, 23 patch: bool = False, 24 interactive: bool = False, 25 debug: bool = False, 26 **kw: Any 27) -> SuccessTuple: 28 """ 29 Edit a Pipe's configuration. 30 31 Parameters 32 ---------- 33 patch: bool, default False 34 If `patch` is True, update parameters by cascading rather than overwriting. 35 interactive: bool, default False 36 If `True`, open an editor for the user to make changes to the pipe's YAML file. 37 debug: bool, default False 38 Verbosity toggle. 39 40 Returns 41 ------- 42 A `SuccessTuple` of success, message. 43 44 """ 45 from meerschaum.utils.venv import Venv 46 from meerschaum.connectors import get_connector_plugin 47 48 if self.temporary: 49 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 50 51 self._invalidate_cache(hard=True, debug=debug) 52 53 if hasattr(self, '_symlinks'): 54 from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path 55 for path, vals in self._symlinks.items(): 56 current_val = get_val_from_dict_path(self.parameters, path) 57 if current_val == vals['substituted']: 58 set_val_in_dict_path(self.parameters, path, vals['original']) 59 60 if not interactive: 61 with Venv(get_connector_plugin(self.instance_connector)): 62 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw) 63 64 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 65 from meerschaum.utils.misc import edit_file 66 parameters_filename = str(self) + '.yaml' 67 parameters_path = PIPES_CACHE_RESOURCES_PATH / parameters_filename 68 69 from meerschaum.utils.yaml import yaml 70 71 edit_text = f"Edit the parameters for {self}" 72 edit_top = '#' * (len(edit_text) + 4) 73 edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n' 74 75 from meerschaum.config import get_config 76 parameters = dict(get_config('pipes', 'parameters', patch=True)) 77 from meerschaum.config._patch import apply_patch_to_config 78 raw_parameters = self.attributes.get('parameters', {}) 79 parameters = apply_patch_to_config(parameters, raw_parameters) 80 81 ### write parameters to yaml file 82 with open(parameters_path, 'w+') as f: 83 f.write(edit_header) 84 yaml.dump(parameters, stream=f, sort_keys=False) 85 86 ### only quit editing if yaml is valid 87 editing = True 88 while editing: 89 edit_file(parameters_path) 90 try: 91 with open(parameters_path, 'r') as f: 92 file_parameters = yaml.load(f.read()) 93 except Exception as e: 94 from meerschaum.utils.warnings import warn 95 warn(f"Invalid format defined for '{self}':\n\n{e}") 96 input(f"Press [Enter] to correct the configuration for '{self}': ") 97 else: 98 editing = False 99 100 self.parameters = file_parameters 101 102 if debug: 103 from meerschaum.utils.formatting import pprint 104 pprint(self.parameters) 105 106 with Venv(get_connector_plugin(self.instance_connector)): 107 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
Edit a Pipe's configuration.
Parameters
- patch (bool, default False):
If
patchis True, update parameters by cascading rather than overwriting. - interactive (bool, default False):
If
True, open an editor for the user to make changes to the pipe's YAML file. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
110def edit_definition( 111 self, 112 yes: bool = False, 113 noask: bool = False, 114 force: bool = False, 115 debug : bool = False, 116 **kw : Any 117) -> SuccessTuple: 118 """ 119 Edit a pipe's definition file and update its configuration. 120 **NOTE:** This function is interactive and should not be used in automated scripts! 121 122 Returns 123 ------- 124 A `SuccessTuple` of success, message. 125 126 """ 127 if self.temporary: 128 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 129 130 from meerschaum.connectors import instance_types 131 if (self.connector is None or isinstance(self.connector, str)) or self.connector.type not in instance_types: 132 return self.edit(interactive=True, debug=debug, **kw) 133 134 import json 135 from meerschaum.utils.warnings import info, warn 136 from meerschaum.utils.debug import dprint 137 from meerschaum.config._patch import apply_patch_to_config 138 from meerschaum.utils.misc import edit_file 139 140 _parameters = self.parameters 141 if 'fetch' not in _parameters: 142 _parameters['fetch'] = {} 143 144 def _edit_api(): 145 from meerschaum.utils.prompt import prompt, yes_no 146 info( 147 f"Please enter the keys of the source pipe from '{self.connector}'.\n" + 148 "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip." 149 ) 150 151 _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None } 152 for k in _keys: 153 _keys[k] = _parameters['fetch'].get(k, None) 154 155 for k, v in _keys.items(): 156 try: 157 _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v) 158 except KeyboardInterrupt: 159 continue 160 if _keys[k] in ('', 'None', '\'None\'', '[None]'): 161 _keys[k] = None 162 163 _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys) 164 165 info("You may optionally specify additional filter parameters as JSON.") 166 print(" Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.") 167 print(" For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':") 168 print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': '))) 169 if force or yes_no( 170 "Would you like to add additional filter parameters?", 171 yes=yes, noask=noask 172 ): 173 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 174 definition_filename = str(self) + '.json' 175 definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename 176 try: 177 definition_path.touch() 178 with open(definition_path, 'w+') as f: 179 json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2) 180 except Exception as e: 181 return False, f"Failed writing file '{definition_path}':\n" + str(e) 182 183 _params = None 184 while True: 185 edit_file(definition_path) 186 try: 187 with open(definition_path, 'r') as f: 188 _params = json.load(f) 189 except Exception as e: 190 warn(f'Failed to read parameters JSON:\n{e}', stack=False) 191 if force or yes_no( 192 "Would you like to try again?\n " 193 + "If not, the parameters JSON file will be ignored.", 194 noask=noask, yes=yes 195 ): 196 continue 197 _params = None 198 break 199 if _params is not None: 200 if 'fetch' not in _parameters: 201 _parameters['fetch'] = {} 202 _parameters['fetch']['params'] = _params 203 204 self.parameters = _parameters 205 return True, "Success" 206 207 def _edit_sql(): 208 import textwrap 209 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 210 from meerschaum.utils.misc import edit_file 211 definition_filename = str(self) + '.sql' 212 definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename 213 214 sql_definition = _parameters['fetch'].get('definition', None) 215 if sql_definition is None: 216 sql_definition = '' 217 sql_definition = textwrap.dedent(sql_definition).lstrip() 218 219 try: 220 definition_path.touch() 221 with open(definition_path, 'w+') as f: 222 f.write(sql_definition) 223 except Exception as e: 224 return False, f"Failed writing file '{definition_path}':\n" + str(e) 225 226 edit_file(definition_path) 227 try: 228 with open(definition_path, 'r', encoding='utf-8') as f: 229 file_definition = f.read() 230 except Exception as e: 231 return False, f"Failed reading file '{definition_path}':\n" + str(e) 232 233 if sql_definition == file_definition: 234 return False, f"No changes made to definition for {self}." 235 236 if ' ' not in file_definition: 237 return False, f"Invalid SQL definition for {self}." 238 239 if debug: 240 dprint("Read SQL definition:\n\n" + file_definition) 241 _parameters['fetch']['definition'] = file_definition 242 self.parameters = _parameters 243 return True, "Success" 244 245 locals()['_edit_' + str(self.connector.type)]() 246 return self.edit(interactive=False, debug=debug, **kw)
Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!
Returns
- A
SuccessTupleof success, message.
13def update(self, *args, **kw) -> SuccessTuple: 14 """ 15 Update a pipe's parameters in its instance. 16 """ 17 kw['interactive'] = False 18 return self.edit(*args, **kw)
Update a pipe's parameters in its instance.
41def sync( 42 self, 43 df: Union[ 44 pd.DataFrame, 45 Dict[str, List[Any]], 46 List[Dict[str, Any]], 47 str, 48 InferFetch 49 ] = InferFetch, 50 begin: Union[datetime, int, str, None] = '', 51 end: Union[datetime, int, None] = None, 52 force: bool = False, 53 retries: int = 10, 54 min_seconds: int = 1, 55 check_existing: bool = True, 56 enforce_dtypes: bool = True, 57 blocking: bool = True, 58 workers: Optional[int] = None, 59 callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, 60 error_callback: Optional[Callable[[Exception], Any]] = None, 61 chunksize: Optional[int] = -1, 62 sync_chunks: bool = True, 63 debug: bool = False, 64 _inplace: bool = True, 65 **kw: Any 66) -> SuccessTuple: 67 """ 68 Fetch new data from the source and update the pipe's table with new data. 69 70 Get new remote data via fetch, get existing data in the same time period, 71 and merge the two, only keeping the unseen data. 72 73 Parameters 74 ---------- 75 df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None 76 An optional DataFrame to sync into the pipe. Defaults to `None`. 77 If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`. 78 79 begin: Union[datetime, int, str, None], default '' 80 Optionally specify the earliest datetime to search for data. 81 82 end: Union[datetime, int, str, None], default None 83 Optionally specify the latest datetime to search for data. 84 85 force: bool, default False 86 If `True`, keep trying to sync untul `retries` attempts. 87 88 retries: int, default 10 89 If `force`, how many attempts to try syncing before declaring failure. 90 91 min_seconds: Union[int, float], default 1 92 If `force`, how many seconds to sleep between retries. Defaults to `1`. 93 94 check_existing: bool, default True 95 If `True`, pull and diff with existing data from the pipe. 96 97 enforce_dtypes: bool, default True 98 If `True`, enforce dtypes on incoming data. 99 Set this to `False` if the incoming rows are expected to be of the correct dtypes. 100 101 blocking: bool, default True 102 If `True`, wait for sync to finish and return its result, otherwise 103 asyncronously sync (oxymoron?) and return success. Defaults to `True`. 104 Only intended for specific scenarios. 105 106 workers: Optional[int], default None 107 If provided and the instance connector is thread-safe 108 (`pipe.instance_connector.IS_THREAD_SAFE is True`), 109 limit concurrent sync to this many threads. 110 111 callback: Optional[Callable[[Tuple[bool, str]], Any]], default None 112 Callback function which expects a SuccessTuple as input. 113 Only applies when `blocking=False`. 114 115 error_callback: Optional[Callable[[Exception], Any]], default None 116 Callback function which expects an Exception as input. 117 Only applies when `blocking=False`. 118 119 chunksize: int, default -1 120 Specify the number of rows to sync per chunk. 121 If `-1`, resort to system configuration (default is `900`). 122 A `chunksize` of `None` will sync all rows in one transaction. 123 124 sync_chunks: bool, default True 125 If possible, sync chunks while fetching them into memory. 126 127 debug: bool, default False 128 Verbosity toggle. Defaults to False. 129 130 Returns 131 ------- 132 A `SuccessTuple` of success (`bool`) and message (`str`). 133 """ 134 from meerschaum.utils.debug import dprint, _checkpoint 135 from meerschaum.utils.formatting import get_console 136 from meerschaum.utils.venv import Venv 137 from meerschaum.connectors import get_connector_plugin 138 from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments 139 from meerschaum.utils.pool import get_pool 140 from meerschaum.config import get_config 141 from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp 142 143 if (callback is not None or error_callback is not None) and blocking: 144 warn("Callback functions are only executed when blocking = False. Ignoring...") 145 146 _checkpoint(_total=2, **kw) 147 148 if chunksize == 0: 149 chunksize = None 150 sync_chunks = False 151 152 begin, end = self.parse_date_bounds(begin, end) 153 kw.update({ 154 'begin': begin, 155 'end': end, 156 'force': force, 157 'retries': retries, 158 'min_seconds': min_seconds, 159 'check_existing': check_existing, 160 'blocking': blocking, 161 'workers': workers, 162 'callback': callback, 163 'error_callback': error_callback, 164 'sync_chunks': sync_chunks, 165 'chunksize': chunksize, 166 'safe_copy': True, 167 }) 168 169 self._invalidate_cache(debug=debug) 170 self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug) 171 172 def _sync( 173 p: mrsm.Pipe, 174 df: Union[ 175 'pd.DataFrame', 176 Dict[str, List[Any]], 177 List[Dict[str, Any]], 178 str, 179 InferFetch 180 ] = InferFetch, 181 ) -> SuccessTuple: 182 if df is None: 183 p._invalidate_cache(debug=debug) 184 return ( 185 False, 186 f"You passed `None` instead of data into `sync()` for {p}.\n" 187 + "Omit the DataFrame to infer fetching.", 188 ) 189 ### Ensure that Pipe is registered. 190 if not p.temporary and p.get_id(debug=debug) is None: 191 ### NOTE: This may trigger an interactive session for plugins! 192 register_success, register_msg = p.register(debug=debug) 193 if not register_success: 194 if 'already' not in register_msg: 195 p._invalidate_cache(debug=debug) 196 return register_success, register_msg 197 198 if isinstance(df, str): 199 from meerschaum.utils.dataframe import parse_simple_lines 200 df = parse_simple_lines(df) 201 202 ### If connector is a plugin with a `sync()` method, return that instead. 203 ### If the plugin does not have a `sync()` method but does have a `fetch()` method, 204 ### use that instead. 205 ### NOTE: The DataFrame must be omitted for the plugin sync method to apply. 206 ### If a DataFrame is provided, continue as expected. 207 if hasattr(df, 'MRSM_INFER_FETCH'): 208 try: 209 if isinstance(p.connector, str): 210 if ':' not in p.connector_keys: 211 return True, f"{p} does not support fetching; nothing to do." 212 213 msg = f"{p} does not have a valid connector." 214 if p.connector_keys.startswith('plugin:'): 215 msg += f"\n Perhaps {p.connector_keys} has a syntax error?" 216 p._invalidate_cache(debug=debug) 217 return False, msg 218 except Exception: 219 p._invalidate_cache(debug=debug) 220 return False, f"Unable to create the connector for {p}." 221 222 ### Sync in place if possible. 223 if ( 224 str(self.connector) == str(self.instance_connector) 225 and 226 hasattr(self.instance_connector, 'sync_pipe_inplace') 227 and 228 _inplace 229 and 230 get_config('system', 'experimental', 'inplace_sync') 231 ): 232 with Venv(get_connector_plugin(self.instance_connector)): 233 p._invalidate_cache(debug=debug) 234 _args, _kwargs = filter_arguments( 235 p.instance_connector.sync_pipe_inplace, 236 p, 237 debug=debug, 238 **kw 239 ) 240 return self.instance_connector.sync_pipe_inplace( 241 *_args, 242 **_kwargs 243 ) 244 245 ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods. 246 try: 247 if getattr(p.connector, 'sync', None) is not None: 248 with Venv(get_connector_plugin(p.connector), debug=debug): 249 _args, _kwargs = filter_arguments( 250 p.connector.sync, 251 p, 252 debug=debug, 253 **kw 254 ) 255 return_tuple = p.connector.sync(*_args, **_kwargs) 256 p._invalidate_cache(debug=debug) 257 if not isinstance(return_tuple, tuple): 258 return_tuple = ( 259 False, 260 f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}" 261 ) 262 return return_tuple 263 264 except Exception as e: 265 get_console().print_exception() 266 msg = f"Failed to sync {p} with exception: '" + str(e) + "'" 267 if debug: 268 error(msg, silent=False) 269 p._invalidate_cache(debug=debug) 270 return False, msg 271 272 ### Fetch the dataframe from the connector's `fetch()` method. 273 try: 274 with Venv(get_connector_plugin(p.connector), debug=debug): 275 df = p.fetch( 276 **filter_keywords( 277 p.fetch, 278 debug=debug, 279 **kw 280 ) 281 ) 282 kw['safe_copy'] = False 283 except Exception as e: 284 get_console().print_exception( 285 suppress=[ 286 'meerschaum/core/Pipe/_sync.py', 287 'meerschaum/core/Pipe/_fetch.py', 288 ] 289 ) 290 msg = f"Failed to fetch data from {p.connector}:\n {e}" 291 df = None 292 293 if df is None: 294 p._invalidate_cache(debug=debug) 295 return False, f"No data were fetched for {p}." 296 297 if isinstance(df, list): 298 if len(df) == 0: 299 return True, f"No new rows were returned for {p}." 300 301 ### May be a chunk hook results list. 302 if isinstance(df[0], tuple): 303 success = all([_success for _success, _ in df]) 304 message = '\n'.join([_message for _, _message in df]) 305 return success, message 306 307 if df is True: 308 p._invalidate_cache(debug=debug) 309 return True, f"{p} is being synced in parallel." 310 311 ### CHECKPOINT: Retrieved the DataFrame. 312 _checkpoint(**kw) 313 314 ### Allow for dataframe generators or iterables. 315 if df_is_chunk_generator(df): 316 kw['workers'] = p.get_num_workers(kw.get('workers', None)) 317 dt_col = p.columns.get('datetime', None) 318 pool = get_pool(workers=kw.get('workers', 1)) 319 if debug: 320 dprint(f"Received {type(df)}. Attempting to sync first chunk...") 321 322 try: 323 chunk = next(df) 324 except StopIteration: 325 return True, "Received an empty generator; nothing to do." 326 327 chunk_success, chunk_msg = _sync(p, chunk) 328 chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg 329 if not chunk_success: 330 return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}" 331 if debug: 332 dprint("Successfully synced the first chunk, attemping the rest...") 333 334 def _process_chunk(_chunk): 335 _chunk_attempts = 0 336 _max_chunk_attempts = 3 337 while _chunk_attempts < _max_chunk_attempts: 338 try: 339 _chunk_success, _chunk_msg = _sync(p, _chunk) 340 except Exception as e: 341 _chunk_success, _chunk_msg = False, str(e) 342 if _chunk_success: 343 break 344 _chunk_attempts += 1 345 _sleep_seconds = _chunk_attempts ** 2 346 warn( 347 ( 348 f"Failed to sync chunk to {self} " 349 + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n" 350 + f"Sleeping for {_sleep_seconds} second" 351 + ('s' if _sleep_seconds != 1 else '') 352 + f":\n{_chunk_msg}" 353 ), 354 stack=False, 355 ) 356 time.sleep(_sleep_seconds) 357 358 num_rows_str = ( 359 f"{num_rows:,} rows" 360 if (num_rows := len(_chunk)) != 1 361 else f"{num_rows} row" 362 ) 363 _chunk_msg = ( 364 ( 365 "Synced" 366 if _chunk_success 367 else "Failed to sync" 368 ) + f" a chunk ({num_rows_str}) to {p}:\n" 369 + self._get_chunk_label(_chunk, dt_col) 370 + '\n' 371 + _chunk_msg 372 ) 373 374 mrsm.pprint((_chunk_success, _chunk_msg), calm=True) 375 return _chunk_success, _chunk_msg 376 377 results = sorted( 378 [(chunk_success, chunk_msg)] + ( 379 list(pool.imap(_process_chunk, df)) 380 if ( 381 not df_is_chunk_generator(chunk) # Handle nested generators. 382 and kw.get('workers', 1) != 1 383 ) 384 else list( 385 _process_chunk(_child_chunks) 386 for _child_chunks in df 387 ) 388 ) 389 ) 390 chunk_messages = [chunk_msg for _, chunk_msg in results] 391 success_bools = [chunk_success for chunk_success, _ in results] 392 num_successes = len([chunk_success for chunk_success, _ in results if chunk_success]) 393 num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success]) 394 success = all(success_bools) 395 msg = ( 396 'Synced ' 397 + f'{len(chunk_messages):,} chunk' 398 + ('s' if len(chunk_messages) != 1 else '') 399 + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n' 400 + '\n\n'.join(chunk_messages).lstrip().rstrip() 401 ).lstrip().rstrip() 402 return success, msg 403 404 ### Cast to a dataframe and ensure datatypes are what we expect. 405 dtypes = p.get_dtypes(debug=debug) 406 df = p.enforce_dtypes( 407 df, 408 chunksize=chunksize, 409 enforce=enforce_dtypes, 410 dtypes=dtypes, 411 debug=debug, 412 ) 413 if p.autotime: 414 dt_col = p.columns.get('datetime', None) 415 ts_col = dt_col or mrsm.get_config( 416 'pipes', 'autotime', 'column_name_if_datetime_missing' 417 ) 418 ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime' 419 if ts_col and hasattr(df, 'columns') and ts_col not in df.columns: 420 precision = p.get_precision(debug=debug) 421 now = get_current_timestamp( 422 precision_unit=precision.get( 423 'unit', 424 STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 425 ), 426 precision_interval=precision.get('interval', 1), 427 round_to=(precision.get('round_to', 'down')), 428 as_int=(are_dtypes_equal(ts_typ, 'int')), 429 ) 430 if debug: 431 dprint(f"Adding current timestamp to dataframe synced to {p}: {now}") 432 433 df[ts_col] = now 434 kw['check_existing'] = dt_col is not None 435 436 ### Capture special columns. 437 capture_success, capture_msg = self._persist_new_special_columns( 438 df, 439 dtypes=dtypes, 440 debug=debug, 441 ) 442 if not capture_success: 443 warn(f"Failed to capture new special columns for {self}:\n{capture_msg}") 444 445 if debug: 446 dprint( 447 "DataFrame to sync:\n" 448 + ( 449 str(df)[:255] 450 + '...' 451 if len(str(df)) >= 256 452 else str(df) 453 ), 454 **kw 455 ) 456 457 ### if force, continue to sync until success 458 return_tuple = False, f"Did not sync {p}." 459 run = True 460 _retries = 1 461 while run: 462 with Venv(get_connector_plugin(self.instance_connector)): 463 return_tuple = p.instance_connector.sync_pipe( 464 pipe=p, 465 df=df, 466 debug=debug, 467 **kw 468 ) 469 _retries += 1 470 run = (not return_tuple[0]) and force and _retries <= retries 471 if run and debug: 472 dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw) 473 dprint(f"Sleeping for {min_seconds} seconds...", **kw) 474 time.sleep(min_seconds) 475 if _retries > retries: 476 warn( 477 f"Unable to sync {p} within {retries} attempt" + 478 ("s" if retries != 1 else "") + "!" 479 ) 480 481 ### CHECKPOINT: Finished syncing. 482 _checkpoint(**kw) 483 p._invalidate_cache(debug=debug) 484 return return_tuple 485 486 if blocking: 487 return _sync(self, df=df) 488 489 from meerschaum.utils.threading import Thread 490 def default_callback(result_tuple: SuccessTuple): 491 dprint(f"Asynchronous result from {self}: {result_tuple}", **kw) 492 493 def default_error_callback(x: Exception): 494 dprint(f"Error received for {self}: {x}", **kw) 495 496 if callback is None and debug: 497 callback = default_callback 498 if error_callback is None and debug: 499 error_callback = default_error_callback 500 try: 501 thread = Thread( 502 target=_sync, 503 args=(self,), 504 kwargs={'df': df}, 505 daemon=False, 506 callback=callback, 507 error_callback=error_callback, 508 ) 509 thread.start() 510 except Exception as e: 511 self._invalidate_cache(debug=debug) 512 return False, str(e) 513 514 self._invalidate_cache(debug=debug) 515 return True, f"Spawned asyncronous sync for {self}."
Fetch new data from the source and update the pipe's table with new data.
Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.
Parameters
- df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None):
An optional DataFrame to sync into the pipe. Defaults to
None. Ifdfis a string, it will be parsed viameerschaum.utils.dataframe.parse_simple_lines(). - begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
- end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
- force (bool, default False):
If
True, keep trying to sync untulretriesattempts. - retries (int, default 10):
If
force, how many attempts to try syncing before declaring failure. - min_seconds (Union[int, float], default 1):
If
force, how many seconds to sleep between retries. Defaults to1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. - enforce_dtypes (bool, default True):
If
True, enforce dtypes on incoming data. Set this toFalseif the incoming rows are expected to be of the correct dtypes. - blocking (bool, default True):
If
True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults toTrue. Only intended for specific scenarios. - workers (Optional[int], default None):
If provided and the instance connector is thread-safe
(
pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads. - callback (Optional[Callable[[Tuple[bool, str]], Any]], default None):
Callback function which expects a SuccessTuple as input.
Only applies when
blocking=False. - error_callback (Optional[Callable[[Exception], Any]], default None):
Callback function which expects an Exception as input.
Only applies when
blocking=False. - chunksize (int, default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. - sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
- debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
518def get_sync_time( 519 self, 520 params: Optional[Dict[str, Any]] = None, 521 newest: bool = True, 522 apply_backtrack_interval: bool = False, 523 remote: bool = False, 524 round_down: bool = False, 525 debug: bool = False 526) -> Union['datetime', int, None]: 527 """ 528 Get the most recent datetime value for a Pipe. 529 530 Parameters 531 ---------- 532 params: Optional[Dict[str, Any]], default None 533 Dictionary to build a WHERE clause for a specific column. 534 See `meerschaum.utils.sql.build_where`. 535 536 newest: bool, default True 537 If `True`, get the most recent datetime (honoring `params`). 538 If `False`, get the oldest datetime (`ASC` instead of `DESC`). 539 540 apply_backtrack_interval: bool, default False 541 If `True`, subtract the backtrack interval from the sync time. 542 543 remote: bool, default False 544 If `True` and the instance connector supports it, return the sync time 545 for the remote table definition. 546 547 round_down: bool, default False 548 If `True`, round down the datetime value to the nearest minute. 549 550 debug: bool, default False 551 Verbosity toggle. 552 553 Returns 554 ------- 555 A `datetime` or int, if the pipe exists, otherwise `None`. 556 557 """ 558 from meerschaum.utils.venv import Venv 559 from meerschaum.connectors import get_connector_plugin 560 from meerschaum.utils.misc import filter_keywords 561 from meerschaum.utils.dtypes import round_time 562 from meerschaum.utils.warnings import warn 563 564 if not self.columns.get('datetime', None): 565 return None 566 567 connector = self.instance_connector if not remote else self.connector 568 if isinstance(connector, str) or connector is None: 569 return None 570 571 with Venv(get_connector_plugin(connector)): 572 if not hasattr(connector, 'get_sync_time'): 573 warn( 574 f"Connectors of type '{connector.type}' " 575 "do not implement `get_sync_time().", 576 stack=False, 577 ) 578 return None 579 sync_time = connector.get_sync_time( 580 self, 581 **filter_keywords( 582 connector.get_sync_time, 583 params=params, 584 newest=newest, 585 remote=remote, 586 debug=debug, 587 ) 588 ) 589 590 if round_down and isinstance(sync_time, datetime): 591 sync_time = round_time(sync_time, timedelta(minutes=1)) 592 593 if apply_backtrack_interval and sync_time is not None: 594 backtrack_interval = self.get_backtrack_interval(debug=debug) 595 try: 596 sync_time -= backtrack_interval 597 except Exception as e: 598 warn(f"Failed to apply backtrack interval:\n{e}") 599 600 return self.parse_date_bounds(sync_time)
Get the most recent datetime value for a Pipe.
Parameters
- params (Optional[Dict[str, Any]], default None):
Dictionary to build a WHERE clause for a specific column.
See
meerschaum.utils.sql.build_where. - newest (bool, default True):
If
True, get the most recent datetime (honoringparams). IfFalse, get the oldest datetime (ASCinstead ofDESC). - apply_backtrack_interval (bool, default False):
If
True, subtract the backtrack interval from the sync time. - remote (bool, default False):
If
Trueand the instance connector supports it, return the sync time for the remote table definition. - round_down (bool, default False):
If
True, round down the datetime value to the nearest minute. - debug (bool, default False): Verbosity toggle.
Returns
- A
datetimeor int, if the pipe exists, otherwiseNone.
603def exists( 604 self, 605 debug: bool = False 606) -> bool: 607 """ 608 See if a Pipe's table exists. 609 610 Parameters 611 ---------- 612 debug: bool, default False 613 Verbosity toggle. 614 615 Returns 616 ------- 617 A `bool` corresponding to whether a pipe's underlying table exists. 618 619 """ 620 from meerschaum.utils.venv import Venv 621 from meerschaum.connectors import get_connector_plugin 622 from meerschaum.utils.debug import dprint 623 from meerschaum.utils.dtypes import get_current_timestamp 624 now = get_current_timestamp('ms', as_int=True) / 1000 625 cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds') 626 627 _exists = self._get_cached_value('_exists', debug=debug) 628 if _exists: 629 exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug) 630 if exists_timestamp is not None: 631 delta = now - exists_timestamp 632 if delta < cache_seconds: 633 if debug: 634 dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).") 635 return _exists 636 637 with Venv(get_connector_plugin(self.instance_connector)): 638 _exists = ( 639 self.instance_connector.pipe_exists(pipe=self, debug=debug) 640 if hasattr(self.instance_connector, 'pipe_exists') 641 else False 642 ) 643 644 self._cache_value('_exists', _exists, debug=debug) 645 self._cache_value('_exists_timestamp', now, debug=debug) 646 return _exists
See if a Pipe's table exists.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
boolcorresponding to whether a pipe's underlying table exists.
649def filter_existing( 650 self, 651 df: 'pd.DataFrame', 652 safe_copy: bool = True, 653 date_bound_only: bool = False, 654 include_unchanged_columns: bool = False, 655 enforce_dtypes: bool = False, 656 chunksize: Optional[int] = -1, 657 debug: bool = False, 658 **kw 659) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']: 660 """ 661 Inspect a dataframe and filter out rows which already exist in the pipe. 662 663 Parameters 664 ---------- 665 df: 'pd.DataFrame' 666 The dataframe to inspect and filter. 667 668 safe_copy: bool, default True 669 If `True`, create a copy before comparing and modifying the dataframes. 670 Setting to `False` may mutate the DataFrames. 671 See `meerschaum.utils.dataframe.filter_unseen_df`. 672 673 date_bound_only: bool, default False 674 If `True`, only use the datetime index to fetch the sample dataframe. 675 676 include_unchanged_columns: bool, default False 677 If `True`, include the backtrack columns which haven't changed in the update dataframe. 678 This is useful if you can't update individual keys. 679 680 enforce_dtypes: bool, default False 681 If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes. 682 Setting `enforce_dtypes=True` may impact performance. 683 684 chunksize: Optional[int], default -1 685 The `chunksize` used when fetching existing data. 686 687 debug: bool, default False 688 Verbosity toggle. 689 690 Returns 691 ------- 692 A tuple of three pandas DataFrames: unseen, update, and delta. 693 """ 694 from meerschaum.utils.warnings import warn 695 from meerschaum.utils.debug import dprint 696 from meerschaum.utils.packages import attempt_import, import_pandas 697 from meerschaum.utils.dataframe import ( 698 filter_unseen_df, 699 add_missing_cols_to_df, 700 get_unhashable_cols, 701 ) 702 from meerschaum.utils.dtypes import ( 703 to_pandas_dtype, 704 none_if_null, 705 to_datetime, 706 are_dtypes_equal, 707 value_is_null, 708 round_time, 709 ) 710 from meerschaum.config import get_config 711 pd = import_pandas() 712 pandas = attempt_import('pandas') 713 if enforce_dtypes or 'dataframe' not in str(type(df)).lower(): 714 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 715 is_dask = hasattr('df', '__module__') and 'dask' in df.__module__ 716 if is_dask: 717 dd = attempt_import('dask.dataframe') 718 merge = dd.merge 719 NA = pandas.NA 720 else: 721 merge = pd.merge 722 NA = pd.NA 723 724 parameters = self.parameters 725 pipe_columns = self.columns 726 primary_key = pipe_columns.get('primary', None) 727 dt_col = pipe_columns.get('datetime', None) 728 dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None 729 autoincrement = parameters.get('autoincrement', False) 730 autotime = parameters.get('autotime', False) 731 732 if primary_key and autoincrement and df is not None and primary_key in df.columns: 733 if safe_copy: 734 df = df.copy() 735 safe_copy = False 736 if df[primary_key].isnull().all(): 737 del df[primary_key] 738 _ = self.columns.pop(primary_key, None) 739 740 if dt_col and autotime and df is not None and dt_col in df.columns: 741 if safe_copy: 742 df = df.copy() 743 safe_copy = False 744 if df[dt_col].isnull().all(): 745 del df[dt_col] 746 _ = self.columns.pop(dt_col, None) 747 748 def get_empty_df(): 749 empty_df = pd.DataFrame([]) 750 dtypes = dict(df.dtypes) if df is not None else {} 751 dtypes.update(self.dtypes) if self.enforce else {} 752 pd_dtypes = { 753 col: to_pandas_dtype(str(typ)) 754 for col, typ in dtypes.items() 755 } 756 return add_missing_cols_to_df(empty_df, pd_dtypes) 757 758 if df is None: 759 empty_df = get_empty_df() 760 return empty_df, empty_df, empty_df 761 762 if (df.empty if not is_dask else len(df) == 0): 763 return df, df, df 764 765 ### begin is the oldest data in the new dataframe 766 begin, end = None, None 767 768 if autoincrement and primary_key == dt_col and dt_col not in df.columns: 769 if enforce_dtypes: 770 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 771 return df, get_empty_df(), df 772 773 if autotime and dt_col and dt_col not in df.columns: 774 if enforce_dtypes: 775 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 776 return df, get_empty_df(), df 777 778 try: 779 min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None 780 if is_dask and min_dt_val is not None: 781 min_dt_val = min_dt_val.compute() 782 min_dt = ( 783 to_datetime(min_dt_val, as_pydatetime=True) 784 if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime') 785 else min_dt_val 786 ) 787 except Exception: 788 min_dt = None 789 790 if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt): 791 if not are_dtypes_equal('int', str(type(min_dt))): 792 min_dt = None 793 794 if isinstance(min_dt, datetime): 795 rounded_min_dt = round_time(min_dt, to='down') 796 try: 797 begin = rounded_min_dt - timedelta(minutes=1) 798 except OverflowError: 799 begin = rounded_min_dt 800 elif dt_type and 'int' in dt_type.lower(): 801 begin = min_dt 802 elif dt_col is None: 803 begin = None 804 805 ### end is the newest data in the new dataframe 806 try: 807 max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None 808 if is_dask and max_dt_val is not None: 809 max_dt_val = max_dt_val.compute() 810 max_dt = ( 811 to_datetime(max_dt_val, as_pydatetime=True) 812 if max_dt_val is not None and 'datetime' in str(dt_type) 813 else max_dt_val 814 ) 815 except Exception: 816 import traceback 817 traceback.print_exc() 818 max_dt = None 819 820 if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt): 821 if not are_dtypes_equal('int', str(type(max_dt))): 822 max_dt = None 823 824 if isinstance(max_dt, datetime): 825 end = ( 826 round_time( 827 max_dt, 828 to='down' 829 ) + timedelta(minutes=1) 830 ) 831 elif dt_type and 'int' in dt_type.lower() and max_dt is not None: 832 end = max_dt + 1 833 834 if max_dt is not None and min_dt is not None and min_dt > max_dt: 835 warn("Detected minimum datetime greater than maximum datetime.") 836 837 if begin is not None and end is not None and begin > end: 838 if isinstance(begin, datetime): 839 begin = end - timedelta(minutes=1) 840 ### We might be using integers for the datetime axis. 841 else: 842 begin = end - 1 843 844 unique_index_vals = { 845 col: df[col].unique() 846 for col in (pipe_columns.values() if not primary_key else [primary_key]) 847 if col in df.columns and col != dt_col 848 } if not date_bound_only else {} 849 unique_index_lens = { 850 col: len(unique_vals) 851 for col, unique_vals in unique_index_vals.items() 852 } if not date_bound_only else {} 853 filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit') 854 _ = kw.pop('params', None) 855 params = { 856 col: [ 857 none_if_null(val) 858 for val in unique_vals 859 ] 860 for col, unique_vals in unique_index_vals.items() 861 if unique_index_lens[col] <= filter_params_index_limit 862 } if not date_bound_only else {} 863 864 if debug: 865 dprint( 866 ( 867 f"Looking at data between '{begin}' and '{end}' with index value lengths:\n" 868 f"{json.dumps(unique_index_lens, indent=4)}\n" 869 ), 870 **kw 871 ) 872 873 backtrack_df = self.get_data( 874 begin=begin, 875 end=end, 876 chunksize=chunksize, 877 params=params, 878 debug=debug, 879 **kw 880 ) 881 if backtrack_df is None: 882 if debug: 883 dprint(f"No backtrack data was found for {self}.") 884 return df, get_empty_df(), df 885 886 if enforce_dtypes: 887 backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug) 888 889 if debug: 890 dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw) 891 dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes)) 892 893 ### Separate new rows from changed ones. 894 on_cols = [ 895 col 896 for col_key, col in pipe_columns.items() 897 if ( 898 col 899 and 900 col_key != 'value' 901 and col in backtrack_df.columns 902 ) 903 ] if not primary_key else [primary_key] 904 905 self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {} 906 on_cols_dtypes = { 907 col: to_pandas_dtype(typ) 908 for col, typ in self_dtypes.items() 909 if col in on_cols 910 } 911 912 ### Detect changes between the old target and new source dataframes. 913 delta_df = add_missing_cols_to_df( 914 filter_unseen_df( 915 backtrack_df, 916 df, 917 dtypes={ 918 col: to_pandas_dtype(typ) 919 for col, typ in self_dtypes.items() 920 }, 921 safe_copy=safe_copy, 922 coerce_mixed_numerics=(not self.static), 923 debug=debug 924 ), 925 on_cols_dtypes, 926 ) 927 if enforce_dtypes: 928 delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug) 929 930 ### Cast dicts or lists to strings so we can merge. 931 serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str) 932 933 def deserializer(x): 934 return json.loads(x) if isinstance(x, str) else x 935 936 unhashable_delta_cols = get_unhashable_cols(delta_df) 937 unhashable_backtrack_cols = get_unhashable_cols(backtrack_df) 938 for col in unhashable_delta_cols: 939 delta_df[col] = delta_df[col].apply(serializer) 940 for col in unhashable_backtrack_cols: 941 backtrack_df[col] = backtrack_df[col].apply(serializer) 942 casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols) 943 944 joined_df = merge( 945 delta_df.infer_objects().fillna(NA), 946 backtrack_df.infer_objects().fillna(NA), 947 how='left', 948 on=on_cols, 949 indicator=True, 950 suffixes=('', '_old'), 951 ) if on_cols else delta_df 952 for col in casted_cols: 953 if col in joined_df.columns: 954 joined_df[col] = joined_df[col].apply(deserializer) 955 if col in delta_df.columns: 956 delta_df[col] = delta_df[col].apply(deserializer) 957 958 ### Determine which rows are completely new. 959 new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None 960 cols = list(delta_df.columns) 961 962 unseen_df = ( 963 joined_df 964 .where(new_rows_mask) 965 .dropna(how='all')[cols] 966 .reset_index(drop=True) 967 ) if on_cols else delta_df 968 969 ### Rows that have already been inserted but values have changed. 970 update_df = ( 971 joined_df 972 .where(~new_rows_mask) 973 .dropna(how='all')[cols] 974 .reset_index(drop=True) 975 ) if on_cols else get_empty_df() 976 977 if include_unchanged_columns and on_cols: 978 unchanged_backtrack_cols = [ 979 col 980 for col in backtrack_df.columns 981 if col in on_cols or col not in update_df.columns 982 ] 983 if enforce_dtypes: 984 update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug) 985 update_df = merge( 986 backtrack_df[unchanged_backtrack_cols], 987 update_df, 988 how='inner', 989 on=on_cols, 990 ) 991 992 return unseen_df, update_df, delta_df
Inspect a dataframe and filter out rows which already exist in the pipe.
Parameters
- df ('pd.DataFrame'): The dataframe to inspect and filter.
- safe_copy (bool, default True):
If
True, create a copy before comparing and modifying the dataframes. Setting toFalsemay mutate the DataFrames. Seemeerschaum.utils.dataframe.filter_unseen_df. - date_bound_only (bool, default False):
If
True, only use the datetime index to fetch the sample dataframe. - include_unchanged_columns (bool, default False):
If
True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys. - enforce_dtypes (bool, default False):
If
True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Settingenforce_dtypes=Truemay impact performance. - chunksize (Optional[int], default -1):
The
chunksizeused when fetching existing data. - debug (bool, default False): Verbosity toggle.
Returns
- A tuple of three pandas DataFrames (unseen, update, and delta.):
1017def get_num_workers(self, workers: Optional[int] = None) -> int: 1018 """ 1019 Get the number of workers to use for concurrent syncs. 1020 1021 Parameters 1022 ---------- 1023 The number of workers passed via `--workers`. 1024 1025 Returns 1026 ------- 1027 The number of workers, capped for safety. 1028 """ 1029 is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False) 1030 if not is_thread_safe: 1031 return 1 1032 1033 engine_pool_size = ( 1034 self.instance_connector.engine.pool.size() 1035 if self.instance_connector.type == 'sql' 1036 else None 1037 ) 1038 current_num_threads = threading.active_count() 1039 current_num_connections = ( 1040 self.instance_connector.engine.pool.checkedout() 1041 if engine_pool_size is not None 1042 else current_num_threads 1043 ) 1044 desired_workers = ( 1045 min(workers or engine_pool_size, engine_pool_size) 1046 if engine_pool_size is not None 1047 else workers 1048 ) 1049 if desired_workers is None: 1050 desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1) 1051 1052 return max( 1053 (desired_workers - current_num_connections), 1054 1, 1055 )
Get the number of workers to use for concurrent syncs.
Parameters
- The number of workers passed via
--workers.
Returns
- The number of workers, capped for safety.
19def verify( 20 self, 21 begin: Union[datetime, int, None] = None, 22 end: Union[datetime, int, None] = None, 23 params: Optional[Dict[str, Any]] = None, 24 chunk_interval: Union[timedelta, int, None] = None, 25 bounded: Optional[bool] = None, 26 deduplicate: bool = False, 27 workers: Optional[int] = None, 28 batchsize: Optional[int] = None, 29 skip_chunks_with_greater_rowcounts: bool = False, 30 check_rowcounts_only: bool = False, 31 debug: bool = False, 32 **kwargs: Any 33) -> SuccessTuple: 34 """ 35 Verify the contents of the pipe by resyncing its interval. 36 37 Parameters 38 ---------- 39 begin: Union[datetime, int, None], default None 40 If specified, only verify rows greater than or equal to this value. 41 42 end: Union[datetime, int, None], default None 43 If specified, only verify rows less than this value. 44 45 chunk_interval: Union[timedelta, int, None], default None 46 If provided, use this as the size of the chunk boundaries. 47 Default to the value set in `pipe.parameters['chunk_minutes']` (1440). 48 49 bounded: Optional[bool], default None 50 If `True`, do not verify older than the oldest sync time or newer than the newest. 51 If `False`, verify unbounded syncs outside of the new and old sync times. 52 The default behavior (`None`) is to bound only if a bound interval is set 53 (e.g. `pipe.parameters['verify']['bound_days']`). 54 55 deduplicate: bool, default False 56 If `True`, deduplicate the pipe's table after the verification syncs. 57 58 workers: Optional[int], default None 59 If provided, limit the verification to this many threads. 60 Use a value of `1` to sync chunks in series. 61 62 batchsize: Optional[int], default None 63 If provided, sync this many chunks in parallel. 64 Defaults to `Pipe.get_num_workers()`. 65 66 skip_chunks_with_greater_rowcounts: bool, default False 67 If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's 68 chunk rowcount equals or exceeds the remote's rowcount. 69 70 check_rowcounts_only: bool, default False 71 If `True`, only compare rowcounts and print chunks which are out-of-sync. 72 73 debug: bool, default False 74 Verbosity toggle. 75 76 kwargs: Any 77 All keyword arguments are passed to `pipe.sync()`. 78 79 Returns 80 ------- 81 A SuccessTuple indicating whether the pipe was successfully resynced. 82 """ 83 from meerschaum.utils.pool import get_pool 84 from meerschaum.utils.formatting import make_header 85 from meerschaum.utils.misc import interval_str 86 workers = self.get_num_workers(workers) 87 check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only 88 89 ### Skip configured bounding in parameters 90 ### if `bounded` is explicitly `False`. 91 bound_time = ( 92 self.get_bound_time(debug=debug) 93 if bounded is not False 94 else None 95 ) 96 if bounded is None: 97 bounded = bound_time is not None 98 99 if bounded and begin is None: 100 begin = ( 101 bound_time 102 if bound_time is not None 103 else self.get_sync_time(newest=False, debug=debug) 104 ) 105 if begin is None: 106 remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug) 107 begin = remote_oldest_sync_time 108 if bounded and end is None: 109 end = self.get_sync_time(newest=True, debug=debug) 110 if end is None: 111 remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug) 112 end = remote_newest_sync_time 113 if end is not None: 114 end += ( 115 timedelta(minutes=1) 116 if hasattr(end, 'tzinfo') 117 else 1 118 ) 119 120 begin, end = self.parse_date_bounds(begin, end) 121 cannot_determine_bounds = bounded and begin is None and end is None 122 123 if cannot_determine_bounds and not check_rowcounts_only: 124 warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False) 125 sync_success, sync_msg = self.sync( 126 begin=begin, 127 end=end, 128 params=params, 129 workers=workers, 130 debug=debug, 131 **kwargs 132 ) 133 if not sync_success: 134 return sync_success, sync_msg 135 136 if deduplicate: 137 return self.deduplicate( 138 begin=begin, 139 end=end, 140 params=params, 141 workers=workers, 142 debug=debug, 143 **kwargs 144 ) 145 return sync_success, sync_msg 146 147 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 148 chunk_bounds = self.get_chunk_bounds( 149 begin=begin, 150 end=end, 151 chunk_interval=chunk_interval, 152 bounded=bounded, 153 debug=debug, 154 ) 155 156 ### Consider it a success if no chunks need to be verified. 157 if not chunk_bounds: 158 if deduplicate: 159 return self.deduplicate( 160 begin=begin, 161 end=end, 162 params=params, 163 workers=workers, 164 debug=debug, 165 **kwargs 166 ) 167 return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do." 168 169 begin_to_print = ( 170 begin 171 if begin is not None 172 else ( 173 chunk_bounds[0][0] 174 if bounded 175 else chunk_bounds[0][1] 176 ) 177 ) 178 end_to_print = ( 179 end 180 if end is not None 181 else ( 182 chunk_bounds[-1][1] 183 if bounded 184 else chunk_bounds[-1][0] 185 ) 186 ) 187 message_header = f"{begin_to_print} - {end_to_print}" 188 max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs') 189 190 info( 191 f"Verifying {self}:\n " 192 + ("Syncing" if not check_rowcounts_only else "Checking") 193 + f" {len(chunk_bounds)} chunk" 194 + ('s' if len(chunk_bounds) != 1 else '') 195 + f" ({'un' if not bounded else ''}bounded)" 196 + f" of size '{interval_str(chunk_interval)}'" 197 + f" between '{begin_to_print}' and '{end_to_print}'.\n" 198 ) 199 200 ### Dictionary of the form bounds -> success_tuple, e.g.: 201 ### { 202 ### (2023-01-01, 2023-01-02): (True, "Success") 203 ### } 204 bounds_success_tuples = {} 205 def process_chunk_bounds( 206 chunk_begin_and_end: Tuple[ 207 Union[int, datetime], 208 Union[int, datetime] 209 ], 210 _workers: Optional[int] = 1, 211 ): 212 if chunk_begin_and_end in bounds_success_tuples: 213 return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end] 214 215 chunk_begin, chunk_end = chunk_begin_and_end 216 do_sync = True 217 chunk_success, chunk_msg = False, "Did not sync chunk." 218 if check_rowcounts: 219 existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug) 220 remote_rowcount = self.get_rowcount( 221 begin=chunk_begin, 222 end=chunk_end, 223 remote=True, 224 debug=debug, 225 ) 226 checked_rows_str = ( 227 f"checked {existing_rowcount:,} row" 228 + ("s" if existing_rowcount != 1 else '') 229 + f" vs {remote_rowcount:,} remote" 230 ) 231 if ( 232 existing_rowcount is not None 233 and remote_rowcount is not None 234 and existing_rowcount >= remote_rowcount 235 ): 236 do_sync = False 237 chunk_success, chunk_msg = True, ( 238 "Row-count is up-to-date " 239 f"({checked_rows_str})." 240 ) 241 elif check_rowcounts_only: 242 do_sync = False 243 chunk_success, chunk_msg = True, ( 244 f"Row-counts are out-of-sync ({checked_rows_str})." 245 ) 246 247 num_syncs = 0 248 while num_syncs < max_chunks_syncs: 249 chunk_success, chunk_msg = self.sync( 250 begin=chunk_begin, 251 end=chunk_end, 252 params=params, 253 workers=_workers, 254 debug=debug, 255 **kwargs 256 ) if do_sync else (chunk_success, chunk_msg) 257 if chunk_success: 258 break 259 num_syncs += 1 260 time.sleep(num_syncs**2) 261 chunk_msg = chunk_msg.strip() 262 if ' - ' not in chunk_msg: 263 chunk_label = f"{chunk_begin} - {chunk_end}" 264 chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}' 265 mrsm.pprint((chunk_success, chunk_msg)) 266 267 return chunk_begin_and_end, (chunk_success, chunk_msg) 268 269 ### If we have more than one chunk, attempt to sync the first one and return if its fails. 270 if len(chunk_bounds) > 1: 271 first_chunk_bounds = chunk_bounds[0] 272 first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}" 273 info(f"Verifying first chunk for {self}:\n {first_label}") 274 ( 275 (first_begin, first_end), 276 (first_success, first_msg) 277 ) = process_chunk_bounds(first_chunk_bounds, _workers=workers) 278 if not first_success: 279 return ( 280 first_success, 281 f"\n{first_label}\n" 282 + f"Failed to sync first chunk:\n{first_msg}" 283 ) 284 bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg) 285 info(f"Completed first chunk for {self}:\n {first_label}\n") 286 chunk_bounds = chunk_bounds[1:] 287 288 pool = get_pool(workers=workers) 289 batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers) 290 291 def process_batch( 292 batch_chunk_bounds: Tuple[ 293 Tuple[Union[datetime, int, None], Union[datetime, int, None]], 294 ... 295 ] 296 ): 297 _batch_begin = batch_chunk_bounds[0][0] 298 _batch_end = batch_chunk_bounds[-1][-1] 299 batch_message_header = f"{_batch_begin} - {_batch_end}" 300 301 if check_rowcounts_only: 302 info(f"Checking row-counts for batch bounds:\n {batch_message_header}") 303 _, (batch_init_success, batch_init_msg) = process_chunk_bounds( 304 (_batch_begin, _batch_end) 305 ) 306 mrsm.pprint((batch_init_success, batch_init_msg)) 307 if batch_init_success and 'up-to-date' in batch_init_msg: 308 info("Entire batch is up-to-date.") 309 return batch_init_success, batch_init_msg 310 311 batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds)) 312 bounds_success_tuples.update(batch_bounds_success_tuples) 313 batch_bounds_success_bools = { 314 bounds: tup[0] 315 for bounds, tup in batch_bounds_success_tuples.items() 316 } 317 318 if all(batch_bounds_success_bools.values()): 319 msg = get_chunks_success_message( 320 batch_bounds_success_tuples, 321 header=batch_message_header, 322 check_rowcounts_only=check_rowcounts_only, 323 ) 324 if deduplicate: 325 deduplicate_success, deduplicate_msg = self.deduplicate( 326 begin=_batch_begin, 327 end=_batch_end, 328 params=params, 329 workers=workers, 330 debug=debug, 331 **kwargs 332 ) 333 return deduplicate_success, msg + '\n\n' + deduplicate_msg 334 return True, msg 335 336 batch_chunk_bounds_to_resync = [ 337 bounds 338 for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools) 339 if not success 340 ] 341 batch_bounds_to_print = [ 342 f"{bounds[0]} - {bounds[1]}" 343 for bounds in batch_chunk_bounds_to_resync 344 ] 345 if batch_bounds_to_print: 346 warn( 347 "Will resync the following failed chunks:\n " 348 + '\n '.join(batch_bounds_to_print), 349 stack=False, 350 ) 351 352 retry_bounds_success_tuples = dict(pool.map( 353 process_chunk_bounds, 354 batch_chunk_bounds_to_resync 355 )) 356 batch_bounds_success_tuples.update(retry_bounds_success_tuples) 357 bounds_success_tuples.update(retry_bounds_success_tuples) 358 retry_bounds_success_bools = { 359 bounds: tup[0] 360 for bounds, tup in retry_bounds_success_tuples.items() 361 } 362 363 if all(retry_bounds_success_bools.values()): 364 chunks_message = ( 365 get_chunks_success_message( 366 batch_bounds_success_tuples, 367 header=batch_message_header, 368 check_rowcounts_only=check_rowcounts_only, 369 ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + ( 370 's' 371 if len(batch_chunk_bounds_to_resync) != 1 372 else '' 373 ) + "." 374 ) 375 if deduplicate: 376 deduplicate_success, deduplicate_msg = self.deduplicate( 377 begin=_batch_begin, 378 end=_batch_end, 379 params=params, 380 workers=workers, 381 debug=debug, 382 **kwargs 383 ) 384 return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg 385 return True, chunks_message 386 387 batch_chunks_message = get_chunks_success_message( 388 batch_bounds_success_tuples, 389 header=batch_message_header, 390 check_rowcounts_only=check_rowcounts_only, 391 ) 392 if deduplicate: 393 deduplicate_success, deduplicate_msg = self.deduplicate( 394 begin=begin, 395 end=end, 396 params=params, 397 workers=workers, 398 debug=debug, 399 **kwargs 400 ) 401 return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg 402 return False, batch_chunks_message 403 404 num_batches = len(batches) 405 for batch_i, batch in enumerate(batches): 406 batch_begin = batch[0][0] 407 batch_end = batch[-1][-1] 408 batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})" 409 batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}" 410 retry_failed_batch = True 411 try: 412 for_self = 'for ' + str(self) 413 batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n ') 414 info(f"Verifying {batch_label_str}\n") 415 batch_success, batch_msg = process_batch(batch) 416 except (KeyboardInterrupt, Exception) as e: 417 batch_success = False 418 batch_msg = str(e) 419 retry_failed_batch = False 420 421 batch_msg_to_print = ( 422 f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}" 423 ) 424 mrsm.pprint((batch_success, batch_msg_to_print)) 425 426 if not batch_success and retry_failed_batch: 427 info(f"Retrying batch {batch_counter_str}...") 428 retry_batch_success, retry_batch_msg = process_batch(batch) 429 retry_batch_msg_to_print = ( 430 f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}" 431 ) 432 mrsm.pprint((retry_batch_success, retry_batch_msg_to_print)) 433 434 batch_success = retry_batch_success 435 batch_msg = retry_batch_msg 436 437 if not batch_success: 438 return False, f"Failed to verify {batch_label}:\n\n{batch_msg}" 439 440 chunks_message = get_chunks_success_message( 441 bounds_success_tuples, 442 header=message_header, 443 check_rowcounts_only=check_rowcounts_only, 444 ) 445 return True, chunks_message
Verify the contents of the pipe by resyncing its interval.
Parameters
- begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
- end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
- chunk_interval (Union[timedelta, int, None], default None):
If provided, use this as the size of the chunk boundaries.
Default to the value set in
pipe.parameters['chunk_minutes'](1440). - bounded (Optional[bool], default None):
If
True, do not verify older than the oldest sync time or newer than the newest. IfFalse, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g.pipe.parameters['verify']['bound_days']). - deduplicate (bool, default False):
If
True, deduplicate the pipe's table after the verification syncs. - workers (Optional[int], default None):
If provided, limit the verification to this many threads.
Use a value of
1to sync chunks in series. - batchsize (Optional[int], default None):
If provided, sync this many chunks in parallel.
Defaults to
Pipe.get_num_workers(). - skip_chunks_with_greater_rowcounts (bool, default False):
If
True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount. - check_rowcounts_only (bool, default False):
If
True, only compare rowcounts and print chunks which are out-of-sync. - debug (bool, default False): Verbosity toggle.
- kwargs (Any):
All keyword arguments are passed to
pipe.sync().
Returns
- A SuccessTuple indicating whether the pipe was successfully resynced.
546def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]: 547 """ 548 Return the interval used to determine the bound time (limit for verification syncs). 549 If the datetime axis is an integer, just return its value. 550 551 Below are the supported keys for the bound interval: 552 553 - `pipe.parameters['verify']['bound_minutes']` 554 - `pipe.parameters['verify']['bound_hours']` 555 - `pipe.parameters['verify']['bound_days']` 556 - `pipe.parameters['verify']['bound_weeks']` 557 - `pipe.parameters['verify']['bound_years']` 558 - `pipe.parameters['verify']['bound_seconds']` 559 560 If multiple keys are present, the first on this priority list will be used. 561 562 Returns 563 ------- 564 A `timedelta` or `int` value to be used to determine the bound time. 565 """ 566 verify_params = self.parameters.get('verify', {}) 567 prefix = 'bound_' 568 suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds') 569 keys_to_search = { 570 key: val 571 for key, val in verify_params.items() 572 if key.startswith(prefix) 573 } 574 bound_time_key, bound_time_value = None, None 575 for key, value in keys_to_search.items(): 576 for suffix in suffixes_to_check: 577 if key == prefix + suffix: 578 bound_time_key = key 579 bound_time_value = value 580 break 581 if bound_time_key is not None: 582 break 583 584 if bound_time_value is None: 585 return bound_time_value 586 587 dt_col = self.columns.get('datetime', None) 588 if not dt_col: 589 return bound_time_value 590 591 dt_typ = self.dtypes.get(dt_col, 'datetime') 592 if 'int' in dt_typ.lower(): 593 return int(bound_time_value) 594 595 interval_type = bound_time_key.replace(prefix, '') 596 return timedelta(**{interval_type: bound_time_value})
Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.
Below are the supported keys for the bound interval:
- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`
If multiple keys are present, the first on this priority list will be used.
Returns
- A
timedeltaorintvalue to be used to determine the bound time.
599def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]: 600 """ 601 The bound time is the limit at which long-running verification syncs should stop. 602 A value of `None` means verification syncs should be unbounded. 603 604 Like deriving a backtrack time from `pipe.get_sync_time()`, 605 the bound time is the sync time minus a large window (e.g. 366 days). 606 607 Unbound verification syncs (i.e. `bound_time is None`) 608 if the oldest sync time is less than the bound interval. 609 610 Returns 611 ------- 612 A `datetime` or `int` corresponding to the 613 `begin` bound for verification and deduplication syncs. 614 """ 615 bound_interval = self.get_bound_interval(debug=debug) 616 if bound_interval is None: 617 return None 618 619 sync_time = self.get_sync_time(debug=debug) 620 if sync_time is None: 621 return None 622 623 bound_time = sync_time - bound_interval 624 oldest_sync_time = self.get_sync_time(newest=False, debug=debug) 625 max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days'] 626 627 extreme_sync_times_delta = ( 628 hasattr(oldest_sync_time, 'tzinfo') 629 and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days) 630 ) 631 632 return ( 633 bound_time 634 if bound_time > oldest_sync_time or extreme_sync_times_delta 635 else None 636 )
The bound time is the limit at which long-running verification syncs should stop.
A value of None means verification syncs should be unbounded.
Like deriving a backtrack time from pipe.get_sync_time(),
the bound time is the sync time minus a large window (e.g. 366 days).
Unbound verification syncs (i.e. bound_time is None)
if the oldest sync time is less than the bound interval.
Returns
- A
datetimeorintcorresponding to the beginbound for verification and deduplication syncs.
12def delete( 13 self, 14 drop: bool = True, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Call the Pipe's instance connector's `delete_pipe()` method. 20 21 Parameters 22 ---------- 23 drop: bool, default True 24 If `True`, drop the pipes' target table. 25 26 debug : bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success (`bool`), message (`str`). 32 33 """ 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.venv import Venv 36 from meerschaum.connectors import get_connector_plugin 37 38 if self.temporary: 39 if self.cache: 40 invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug) 41 if not invalidate_success: 42 return invalidate_success, invalidate_msg 43 44 return ( 45 False, 46 "Cannot delete pipes created with `temporary=True` (read-only). " 47 + "You may want to call `pipe.drop()` instead." 48 ) 49 50 if drop: 51 drop_success, drop_msg = self.drop(debug=debug) 52 if not drop_success: 53 warn(f"Failed to drop {self}:\n{drop_msg}") 54 55 with Venv(get_connector_plugin(self.instance_connector)): 56 result = self.instance_connector.delete_pipe(self, debug=debug, **kw) 57 58 if not isinstance(result, tuple): 59 return False, f"Received an unexpected result from '{self.instance_connector}': {result}" 60 61 if result[0]: 62 self._invalidate_cache(hard=True, debug=debug) 63 64 return result
Call the Pipe's instance connector's delete_pipe() method.
Parameters
- drop (bool, default True):
If
True, drop the pipes' target table. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool), message (str).
14def drop( 15 self, 16 debug: bool = False, 17 **kw: Any 18) -> SuccessTuple: 19 """ 20 Call the Pipe's instance connector's `drop_pipe()` method. 21 22 Parameters 23 ---------- 24 debug: bool, default False: 25 Verbosity toggle. 26 27 Returns 28 ------- 29 A `SuccessTuple` of success, message. 30 31 """ 32 from meerschaum.utils.venv import Venv 33 from meerschaum.connectors import get_connector_plugin 34 35 self._clear_cache_key('_exists', debug=debug) 36 37 with Venv(get_connector_plugin(self.instance_connector)): 38 if hasattr(self.instance_connector, 'drop_pipe'): 39 result = self.instance_connector.drop_pipe(self, debug=debug, **kw) 40 else: 41 result = ( 42 False, 43 ( 44 "Cannot drop pipes for instance connectors of type " 45 f"'{self.instance_connector.type}'." 46 ) 47 ) 48 49 self._clear_cache_key('_exists', debug=debug) 50 self._clear_cache_key('_exists_timestamp', debug=debug) 51 52 return result
Call the Pipe's instance connector's drop_pipe() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
55def drop_indices( 56 self, 57 columns: Optional[List[str]] = None, 58 debug: bool = False, 59 **kw: Any 60) -> SuccessTuple: 61 """ 62 Call the Pipe's instance connector's `drop_indices()` method. 63 64 Parameters 65 ---------- 66 columns: Optional[List[str]] = None 67 If provided, only drop indices in the given list. 68 69 debug: bool, default False: 70 Verbosity toggle. 71 72 Returns 73 ------- 74 A `SuccessTuple` of success, message. 75 76 """ 77 from meerschaum.utils.venv import Venv 78 from meerschaum.connectors import get_connector_plugin 79 80 self._clear_cache_key('_columns_indices', debug=debug) 81 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 82 self._clear_cache_key('_columns_types', debug=debug) 83 self._clear_cache_key('_columns_types_timestamp', debug=debug) 84 85 with Venv(get_connector_plugin(self.instance_connector)): 86 if hasattr(self.instance_connector, 'drop_pipe_indices'): 87 result = self.instance_connector.drop_pipe_indices( 88 self, 89 columns=columns, 90 debug=debug, 91 **kw 92 ) 93 else: 94 result = ( 95 False, 96 ( 97 "Cannot drop indices for instance connectors of type " 98 f"'{self.instance_connector.type}'." 99 ) 100 ) 101 102 self._clear_cache_key('_columns_indices', debug=debug) 103 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 104 self._clear_cache_key('_columns_types', debug=debug) 105 self._clear_cache_key('_columns_types_timestamp', debug=debug) 106 107 return result
Call the Pipe's instance connector's drop_indices() method.
Parameters
- columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
14def create_indices( 15 self, 16 columns: Optional[List[str]] = None, 17 debug: bool = False, 18 **kw: Any 19) -> SuccessTuple: 20 """ 21 Call the Pipe's instance connector's `create_pipe_indices()` method. 22 23 Parameters 24 ---------- 25 debug: bool, default False: 26 Verbosity toggle. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 32 """ 33 from meerschaum.utils.venv import Venv 34 from meerschaum.connectors import get_connector_plugin 35 36 self._clear_cache_key('_columns_indices', debug=debug) 37 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 38 self._clear_cache_key('_columns_types', debug=debug) 39 self._clear_cache_key('_columns_types_timestamp', debug=debug) 40 41 with Venv(get_connector_plugin(self.instance_connector)): 42 if hasattr(self.instance_connector, 'create_pipe_indices'): 43 result = self.instance_connector.create_pipe_indices( 44 self, 45 columns=columns, 46 debug=debug, 47 **kw 48 ) 49 else: 50 result = ( 51 False, 52 ( 53 "Cannot create indices for instance connectors of type " 54 f"'{self.instance_connector.type}'." 55 ) 56 ) 57 58 self._clear_cache_key('_columns_indices', debug=debug) 59 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 60 self._clear_cache_key('_columns_types', debug=debug) 61 self._clear_cache_key('_columns_types_timestamp', debug=debug) 62 63 return result
Call the Pipe's instance connector's create_pipe_indices() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
16def clear( 17 self, 18 begin: Optional[datetime] = None, 19 end: Optional[datetime] = None, 20 params: Optional[Dict[str, Any]] = None, 21 debug: bool = False, 22 **kwargs: Any 23) -> SuccessTuple: 24 """ 25 Call the Pipe's instance connector's `clear_pipe` method. 26 27 Parameters 28 ---------- 29 begin: Optional[datetime], default None: 30 If provided, only remove rows newer than this datetime value. 31 32 end: Optional[datetime], default None: 33 If provided, only remove rows older than this datetime column (not including end). 34 35 params: Optional[Dict[str, Any]], default None 36 See `meerschaum.utils.sql.build_where`. 37 38 debug: bool, default False: 39 Verbositity toggle. 40 41 Returns 42 ------- 43 A `SuccessTuple` corresponding to whether this procedure completed successfully. 44 45 Examples 46 -------- 47 >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local') 48 >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]}) 49 >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]}) 50 >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]}) 51 >>> 52 >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0)) 53 >>> pipe.get_data() 54 dt 55 0 2020-01-01 56 57 """ 58 from meerschaum.utils.warnings import warn 59 from meerschaum.utils.venv import Venv 60 from meerschaum.connectors import get_connector_plugin 61 62 begin, end = self.parse_date_bounds(begin, end) 63 64 with Venv(get_connector_plugin(self.instance_connector)): 65 return self.instance_connector.clear_pipe( 66 self, 67 begin=begin, 68 end=end, 69 params=params, 70 debug=debug, 71 **kwargs 72 )
Call the Pipe's instance connector's clear_pipe method.
Parameters
- begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
- end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
See
meerschaum.utils.sql.build_where. - debug (bool, default False:): Verbositity toggle.
Returns
- A
SuccessTuplecorresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>>
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
dt
0 2020-01-01
15def deduplicate( 16 self, 17 begin: Union[datetime, int, None] = None, 18 end: Union[datetime, int, None] = None, 19 params: Optional[Dict[str, Any]] = None, 20 chunk_interval: Union[datetime, int, None] = None, 21 bounded: Optional[bool] = None, 22 workers: Optional[int] = None, 23 debug: bool = False, 24 _use_instance_method: bool = True, 25 **kwargs: Any 26) -> SuccessTuple: 27 """ 28 Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows. 29 30 Parameters 31 ---------- 32 begin: Union[datetime, int, None], default None: 33 If provided, only deduplicate rows newer than this datetime value. 34 35 end: Union[datetime, int, None], default None: 36 If provided, only deduplicate rows older than this datetime column (not including end). 37 38 params: Optional[Dict[str, Any]], default None 39 Restrict deduplication to this filter (for multiplexed data streams). 40 See `meerschaum.utils.sql.build_where`. 41 42 chunk_interval: Union[timedelta, int, None], default None 43 If provided, use this for the chunk bounds. 44 Defaults to the value set in `pipe.parameters['chunk_minutes']` (1440). 45 46 bounded: Optional[bool], default None 47 Only check outside the oldest and newest sync times if bounded is explicitly `False`. 48 49 workers: Optional[int], default None 50 If the instance connector is thread-safe, limit concurrenct syncs to this many threads. 51 52 debug: bool, default False: 53 Verbositity toggle. 54 55 kwargs: Any 56 All other keyword arguments are passed to 57 `pipe.sync()`, `pipe.clear()`, and `pipe.get_data(). 58 59 Returns 60 ------- 61 A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated. 62 """ 63 from meerschaum.utils.warnings import warn, info 64 from meerschaum.utils.misc import interval_str, items_str 65 from meerschaum.utils.venv import Venv 66 from meerschaum.connectors import get_connector_plugin 67 from meerschaum.utils.pool import get_pool 68 69 begin, end = self.parse_date_bounds(begin, end) 70 71 workers = self.get_num_workers(workers=workers) 72 pool = get_pool(workers=workers) 73 74 if _use_instance_method: 75 with Venv(get_connector_plugin(self.instance_connector)): 76 if hasattr(self.instance_connector, 'deduplicate_pipe'): 77 return self.instance_connector.deduplicate_pipe( 78 self, 79 begin=begin, 80 end=end, 81 params=params, 82 bounded=bounded, 83 debug=debug, 84 **kwargs 85 ) 86 87 ### Only unbound if explicitly False. 88 if bounded is None: 89 bounded = True 90 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 91 92 bound_time = self.get_bound_time(debug=debug) 93 if bounded and begin is None: 94 begin = ( 95 bound_time 96 if bound_time is not None 97 else self.get_sync_time(newest=False, debug=debug) 98 ) 99 if bounded and end is None: 100 end = self.get_sync_time(newest=True, debug=debug) 101 if end is not None: 102 end += ( 103 timedelta(minutes=1) 104 if hasattr(end, 'tzinfo') 105 else 1 106 ) 107 108 chunk_bounds = self.get_chunk_bounds( 109 bounded=bounded, 110 begin=begin, 111 end=end, 112 chunk_interval=chunk_interval, 113 debug=debug, 114 ) 115 116 indices = [col for col in self.columns.values() if col] 117 if not indices: 118 return False, "Cannot deduplicate without index columns." 119 120 def process_chunk_bounds(bounds) -> Tuple[ 121 Tuple[ 122 Union[datetime, int, None], 123 Union[datetime, int, None] 124 ], 125 SuccessTuple 126 ]: 127 ### Only selecting the index values here to keep bandwidth down. 128 chunk_begin, chunk_end = bounds 129 chunk_df = self.get_data( 130 select_columns=indices, 131 begin=chunk_begin, 132 end=chunk_end, 133 params=params, 134 debug=debug, 135 ) 136 if chunk_df is None: 137 return bounds, (True, "") 138 existing_chunk_len = len(chunk_df) 139 deduped_chunk_df = chunk_df.drop_duplicates(keep='last') 140 deduped_chunk_len = len(deduped_chunk_df) 141 142 if existing_chunk_len == deduped_chunk_len: 143 return bounds, (True, "") 144 145 chunk_msg_header = f"\n{chunk_begin} - {chunk_end}" 146 chunk_msg_body = "" 147 148 full_chunk = self.get_data( 149 begin=chunk_begin, 150 end=chunk_end, 151 params=params, 152 debug=debug, 153 ) 154 if full_chunk is None or len(full_chunk) == 0: 155 return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...") 156 157 chunk_indices = [ix for ix in indices if ix in full_chunk.columns] 158 if not chunk_indices: 159 return bounds, (False, f"None of {items_str(indices)} were present in chunk.") 160 try: 161 full_chunk = full_chunk.drop_duplicates( 162 subset=chunk_indices, 163 keep='last' 164 ).reset_index( 165 drop=True, 166 ) 167 except Exception as e: 168 return ( 169 bounds, 170 (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})") 171 ) 172 173 clear_success, clear_msg = self.clear( 174 begin=chunk_begin, 175 end=chunk_end, 176 params=params, 177 debug=debug, 178 ) 179 if not clear_success: 180 chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n" 181 warn(chunk_msg_body) 182 183 sync_success, sync_msg = self.sync(full_chunk, debug=debug) 184 if not sync_success: 185 chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n" 186 187 ### Finally check if the deduplication worked. 188 chunk_rowcount = self.get_rowcount( 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 debug=debug, 193 ) 194 if chunk_rowcount != deduped_chunk_len: 195 return bounds, ( 196 False, ( 197 chunk_msg_header + "\n" 198 + chunk_msg_body + ("\n" if chunk_msg_body else '') 199 + "Chunk rowcounts still differ (" 200 + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)." 201 ) 202 ) 203 204 return bounds, ( 205 True, ( 206 chunk_msg_header + "\n" 207 + chunk_msg_body + ("\n" if chunk_msg_body else '') 208 + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows." 209 ) 210 ) 211 212 info( 213 f"Deduplicating {len(chunk_bounds)} chunk" 214 + ('s' if len(chunk_bounds) != 1 else '') 215 + f" ({'un' if not bounded else ''}bounded)" 216 + f" of size '{interval_str(chunk_interval)}'" 217 + f" on {self}." 218 ) 219 bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds)) 220 bounds_successes = { 221 bounds: success_tuple 222 for bounds, success_tuple in bounds_success_tuples.items() 223 if success_tuple[0] 224 } 225 bounds_failures = { 226 bounds: success_tuple 227 for bounds, success_tuple in bounds_success_tuples.items() 228 if not success_tuple[0] 229 } 230 231 ### No need to retry if everything failed. 232 if len(bounds_failures) > 0 and len(bounds_successes) == 0: 233 return ( 234 False, 235 ( 236 f"Failed to deduplicate {len(bounds_failures)} chunk" 237 + ('s' if len(bounds_failures) != 1 else '') 238 + ".\n" 239 + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg]) 240 ) 241 ) 242 243 retry_bounds = [bounds for bounds in bounds_failures] 244 if not retry_bounds: 245 return ( 246 True, 247 ( 248 f"Successfully deduplicated {len(bounds_successes)} chunk" 249 + ('s' if len(bounds_successes) != 1 else '') 250 + ".\n" 251 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 252 ).rstrip('\n') 253 ) 254 255 info(f"Retrying {len(retry_bounds)} chunks for {self}...") 256 retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds)) 257 retry_bounds_successes = { 258 bounds: success_tuple 259 for bounds, success_tuple in bounds_success_tuples.items() 260 if success_tuple[0] 261 } 262 retry_bounds_failures = { 263 bounds: success_tuple 264 for bounds, success_tuple in bounds_success_tuples.items() 265 if not success_tuple[0] 266 } 267 268 bounds_successes.update(retry_bounds_successes) 269 if not retry_bounds_failures: 270 return ( 271 True, 272 ( 273 f"Successfully deduplicated {len(bounds_successes)} chunk" 274 + ('s' if len(bounds_successes) != 1 else '') 275 + f"({len(retry_bounds_successes)} retried):\n" 276 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 277 ).rstrip('\n') 278 ) 279 280 return ( 281 False, 282 ( 283 f"Failed to deduplicate {len(bounds_failures)} chunk" 284 + ('s' if len(retry_bounds_failures) != 1 else '') 285 + ".\n" 286 + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg]) 287 ).rstrip('\n') 288 )
Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.
Parameters
- begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
- end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
Restrict deduplication to this filter (for multiplexed data streams).
See
meerschaum.utils.sql.build_where. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this for the chunk bounds.
Defaults to the value set in
pipe.parameters['chunk_minutes'](1440). - bounded (Optional[bool], default None):
Only check outside the oldest and newest sync times if bounded is explicitly
False. - workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
- debug (bool, default False:): Verbositity toggle.
- kwargs (Any):
All other keyword arguments are passed to
pipe.sync(),pipe.clear(), and `pipe.get_data().
Returns
- A
SuccessTuplecorresponding to whether all of the chunks were successfully deduplicated.
16def bootstrap( 17 self, 18 debug: bool = False, 19 yes: bool = False, 20 force: bool = False, 21 noask: bool = False, 22 shell: bool = False, 23 **kw 24) -> SuccessTuple: 25 """ 26 Prompt the user to create a pipe's requirements all from one method. 27 This method shouldn't be used in any automated scripts because it interactively 28 prompts the user and therefore may hang. 29 30 Parameters 31 ---------- 32 debug: bool, default False: 33 Verbosity toggle. 34 35 yes: bool, default False: 36 Print the questions and automatically agree. 37 38 force: bool, default False: 39 Skip the questions and agree anyway. 40 41 noask: bool, default False: 42 Print the questions but go with the default answer. 43 44 shell: bool, default False: 45 Used to determine if we are in the interactive shell. 46 47 Returns 48 ------- 49 A `SuccessTuple` corresponding to the success of this procedure. 50 51 """ 52 53 from meerschaum.utils.warnings import info 54 from meerschaum.utils.prompt import prompt, yes_no 55 from meerschaum.utils.formatting import pprint 56 from meerschaum.config import get_config 57 from meerschaum.utils.formatting._shell import clear_screen 58 from meerschaum.utils.formatting import print_tuple 59 from meerschaum.actions import actions 60 from meerschaum.utils.venv import Venv 61 from meerschaum.connectors import get_connector_plugin 62 63 _clear = get_config('shell', 'clear_screen', patch=True) 64 65 if self.get_id(debug=debug) is not None: 66 delete_tuple = self.delete(debug=debug) 67 if not delete_tuple[0]: 68 return delete_tuple 69 70 if _clear: 71 clear_screen(debug=debug) 72 73 _parameters = _get_parameters(self, debug=debug) 74 self.parameters = _parameters 75 pprint(self.parameters) 76 try: 77 prompt( 78 f"\n Press [Enter] to register {self} with the above configuration:", 79 icon = False 80 ) 81 except KeyboardInterrupt: 82 return False, f"Aborted bootstrapping {self}." 83 84 with Venv(get_connector_plugin(self.instance_connector)): 85 register_tuple = self.instance_connector.register_pipe(self, debug=debug) 86 87 if not register_tuple[0]: 88 return register_tuple 89 90 if _clear: 91 clear_screen(debug=debug) 92 93 try: 94 if yes_no( 95 f"Would you like to edit the definition for {self}?", 96 yes=yes, 97 noask=noask, 98 default='n', 99 ): 100 edit_tuple = self.edit_definition(debug=debug) 101 if not edit_tuple[0]: 102 return edit_tuple 103 104 if yes_no( 105 f"Would you like to try syncing {self} now?", 106 yes=yes, 107 noask=noask, 108 default='n', 109 ): 110 sync_tuple = actions['sync']( 111 ['pipes'], 112 connector_keys=[self.connector_keys], 113 metric_keys=[self.metric_key], 114 location_keys=[self.location_key], 115 mrsm_instance=str(self.instance_connector), 116 debug=debug, 117 shell=shell, 118 ) 119 if not sync_tuple[0]: 120 return sync_tuple 121 except Exception as e: 122 return False, f"Failed to bootstrap {self}:\n" + str(e) 123 124 print_tuple((True, f"Finished bootstrapping {self}!")) 125 info( 126 "You can edit this pipe later with `edit pipes` " 127 + "or set the definition with `edit pipes definition`.\n" 128 + " To sync data into your pipe, run `sync pipes`." 129 ) 130 131 return True, "Success"
Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.
Parameters
- debug (bool, default False:): Verbosity toggle.
- yes (bool, default False:): Print the questions and automatically agree.
- force (bool, default False:): Skip the questions and agree anyway.
- noask (bool, default False:): Print the questions but go with the default answer.
- shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
- A
SuccessTuplecorresponding to the success of this procedure.
20def enforce_dtypes( 21 self, 22 df: 'pd.DataFrame', 23 chunksize: Optional[int] = -1, 24 enforce: bool = True, 25 safe_copy: bool = True, 26 dtypes: Optional[Dict[str, str]] = None, 27 debug: bool = False, 28) -> 'pd.DataFrame': 29 """ 30 Cast the input dataframe to the pipe's registered data types. 31 If the pipe does not exist and dtypes are not set, return the dataframe. 32 """ 33 import traceback 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.debug import dprint 36 from meerschaum.utils.dataframe import ( 37 parse_df_datetimes, 38 enforce_dtypes as _enforce_dtypes, 39 parse_simple_lines, 40 ) 41 from meerschaum.utils.dtypes import are_dtypes_equal 42 from meerschaum.utils.packages import import_pandas 43 pd = import_pandas(debug=debug) 44 if df is None: 45 if debug: 46 dprint( 47 "Received None instead of a DataFrame.\n" 48 + " Skipping dtype enforcement..." 49 ) 50 return df 51 52 if not self.enforce: 53 enforce = False 54 55 explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {} 56 pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes 57 58 try: 59 if isinstance(df, str): 60 if df.strip() and df.strip()[0] not in ('{', '['): 61 df = parse_df_datetimes( 62 parse_simple_lines(df), 63 ignore_cols=[ 64 col 65 for col, dtype in pipe_dtypes.items() 66 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 67 ], 68 ) 69 else: 70 df = parse_df_datetimes( 71 pd.read_json(StringIO(df)), 72 ignore_cols=[ 73 col 74 for col, dtype in pipe_dtypes.items() 75 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 76 ], 77 ignore_all=(not enforce), 78 strip_timezone=(self.tzinfo is None), 79 chunksize=chunksize, 80 debug=debug, 81 ) 82 elif isinstance(df, (dict, list, tuple)): 83 df = parse_df_datetimes( 84 df, 85 ignore_cols=[ 86 col 87 for col, dtype in pipe_dtypes.items() 88 if (not enforce or not are_dtypes_equal(str(dtype), 'datetime')) 89 ], 90 strip_timezone=(self.tzinfo is None), 91 chunksize=chunksize, 92 debug=debug, 93 ) 94 except Exception as e: 95 warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}") 96 return None 97 98 if not pipe_dtypes: 99 if debug: 100 dprint( 101 f"Could not find dtypes for {self}.\n" 102 + "Skipping dtype enforcement..." 103 ) 104 return df 105 106 return _enforce_dtypes( 107 df, 108 pipe_dtypes, 109 explicit_dtypes=explicit_dtypes, 110 safe_copy=safe_copy, 111 strip_timezone=(self.tzinfo is None), 112 coerce_numeric=self.mixed_numerics, 113 coerce_timezone=enforce, 114 debug=debug, 115 )
Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.
118def infer_dtypes( 119 self, 120 persist: bool = False, 121 refresh: bool = False, 122 debug: bool = False, 123) -> Dict[str, Any]: 124 """ 125 If `dtypes` is not set in `meerschaum.Pipe.parameters`, 126 infer the data types from the underlying table if it exists. 127 128 Parameters 129 ---------- 130 persist: bool, default False 131 If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`. 132 NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only. 133 134 refresh: bool, default False 135 If `True`, retrieve the latest columns-types for the pipe. 136 See `Pipe.get_columns.types()`. 137 138 Returns 139 ------- 140 A dictionary of strings containing the pandas data types for this Pipe. 141 """ 142 if not self.exists(debug=debug): 143 return {} 144 145 from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type 146 from meerschaum.utils.dtypes import to_pandas_dtype 147 148 ### NOTE: get_columns_types() may return either the types as 149 ### PostgreSQL- or Pandas-style. 150 columns_types = self.get_columns_types(refresh=refresh, debug=debug) 151 152 remote_pd_dtypes = { 153 c: ( 154 get_pd_type_from_db_type(t, allow_custom_dtypes=True) 155 if str(t).isupper() 156 else to_pandas_dtype(t) 157 ) 158 for c, t in columns_types.items() 159 } if columns_types else {} 160 if not persist: 161 return remote_pd_dtypes 162 163 parameters = self.get_parameters(refresh=refresh, debug=debug) 164 dtypes = parameters.get('dtypes', {}) 165 dtypes.update({ 166 col: typ 167 for col, typ in remote_pd_dtypes.items() 168 if col not in dtypes 169 }) 170 self.dtypes = dtypes 171 self.edit(interactive=False, debug=debug) 172 return remote_pd_dtypes
If dtypes is not set in meerschaum.Pipe.parameters,
infer the data types from the underlying table if it exists.
Parameters
- persist (bool, default False):
If
True, persist the inferred data types tomeerschaum.Pipe.parameters. NOTE: Use with caution! Generallydtypesis meant to be user-configurable only. - refresh (bool, default False):
If
True, retrieve the latest columns-types for the pipe. SeePipe.get_columns.types().
Returns
- A dictionary of strings containing the pandas data types for this Pipe.
15def copy_to( 16 self, 17 instance_keys: str, 18 sync: bool = True, 19 begin: Union[datetime, int, None] = None, 20 end: Union[datetime, int, None] = None, 21 params: Optional[Dict[str, Any]] = None, 22 chunk_interval: Union[timedelta, int, None] = None, 23 debug: bool = False, 24 **kwargs: Any 25) -> SuccessTuple: 26 """ 27 Copy a pipe to another instance. 28 29 Parameters 30 ---------- 31 instance_keys: str 32 The instance to which to copy this pipe. 33 34 sync: bool, default True 35 If `True`, sync the source pipe's documents 36 37 begin: Union[datetime, int, None], default None 38 Beginning datetime value to pass to `Pipe.get_data()`. 39 40 end: Union[datetime, int, None], default None 41 End datetime value to pass to `Pipe.get_data()`. 42 43 params: Optional[Dict[str, Any]], default None 44 Parameters filter to pass to `Pipe.get_data()`. 45 46 chunk_interval: Union[timedelta, int, None], default None 47 The size of chunks to retrieve from `Pipe.get_data()` for syncing. 48 49 kwargs: Any 50 Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`. 51 52 Returns 53 ------- 54 A SuccessTuple indicating success. 55 """ 56 if str(instance_keys) == self.instance_keys: 57 return False, f"Cannot copy {self} to instance '{instance_keys}'." 58 59 begin, end = self.parse_date_bounds(begin, end) 60 61 new_pipe = mrsm.Pipe( 62 self.connector_keys, 63 self.metric_key, 64 self.location_key, 65 parameters=self.parameters.copy(), 66 instance=instance_keys, 67 ) 68 69 new_pipe_is_registered = new_pipe.get_id() is not None 70 71 metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register 72 metadata_success, metadata_msg = metadata_method(debug=debug) 73 if not metadata_success: 74 return metadata_success, metadata_msg 75 76 if not self.exists(debug=debug): 77 return True, f"{self} does not exist; nothing to sync." 78 79 original_as_iterator = kwargs.get('as_iterator', None) 80 kwargs['as_iterator'] = True 81 82 chunk_generator = self.get_data( 83 begin=begin, 84 end=end, 85 params=params, 86 chunk_interval=chunk_interval, 87 debug=debug, 88 **kwargs 89 ) 90 91 if original_as_iterator is None: 92 _ = kwargs.pop('as_iterator', None) 93 else: 94 kwargs['as_iterator'] = original_as_iterator 95 96 sync_success, sync_msg = new_pipe.sync( 97 chunk_generator, 98 begin=begin, 99 end=end, 100 params=params, 101 debug=debug, 102 **kwargs 103 ) 104 msg = ( 105 f"Successfully synced {new_pipe}:\n{sync_msg}" 106 if sync_success 107 else f"Failed to sync {new_pipe}:\n{sync_msg}" 108 ) 109 return sync_success, msg
Copy a pipe to another instance.
Parameters
- instance_keys (str): The instance to which to copy this pipe.
- sync (bool, default True):
If
True, sync the source pipe's documents - begin (Union[datetime, int, None], default None):
Beginning datetime value to pass to
Pipe.get_data(). - end (Union[datetime, int, None], default None):
End datetime value to pass to
Pipe.get_data(). - params (Optional[Dict[str, Any]], default None):
Parameters filter to pass to
Pipe.get_data(). - chunk_interval (Union[timedelta, int, None], default None):
The size of chunks to retrieve from
Pipe.get_data()for syncing. - kwargs (Any):
Additional flags to pass to
Pipe.get_data()andPipe.sync(), e.g.workers.
Returns
- A SuccessTuple indicating success.
30class Plugin: 31 """Handle packaging of Meerschaum plugins.""" 32 33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 from meerschaum._internal.static import STATIC_CONFIG 46 from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo 74 75 76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector 93 94 95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version 106 107 108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module 121 122 123 @property 124 def __file__(self) -> Union[str, None]: 125 """ 126 Return the file path (str) of the plugin if it exists, otherwise `None`. 127 """ 128 if self.__dict__.get('_module', None) is not None: 129 return self.module.__file__ 130 131 from meerschaum.config.paths import PLUGINS_RESOURCES_PATH 132 133 potential_dir = PLUGINS_RESOURCES_PATH / self.name 134 if ( 135 potential_dir.exists() 136 and potential_dir.is_dir() 137 and (potential_dir / '__init__.py').exists() 138 ): 139 return str((potential_dir / '__init__.py').as_posix()) 140 141 potential_file = PLUGINS_RESOURCES_PATH / (self.name + '.py') 142 if potential_file.exists() and not potential_file.is_dir(): 143 return str(potential_file.as_posix()) 144 145 return None 146 147 148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path 159 160 161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None 170 171 172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path 255 256 257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 from meerschaum.utils.warnings import warn, error 289 if debug: 290 from meerschaum.utils.debug import dprint 291 import tarfile 292 import re 293 import ast 294 from meerschaum.plugins import sync_plugins_symlinks 295 from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum 296 from meerschaum.utils.venv import init_venv 297 from meerschaum.utils.misc import safely_extract_tar 298 from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS 299 old_cwd = os.getcwd() 300 old_version = '' 301 new_version = '' 302 temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name 303 temp_dir.mkdir(exist_ok=True) 304 305 if not self.archive_path.exists(): 306 return False, f"Missing archive file for plugin '{self}'." 307 if self.version is not None: 308 old_version = self.version 309 if debug: 310 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 311 312 if debug: 313 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 314 315 try: 316 with tarfile.open(self.archive_path, 'r:gz') as tarf: 317 safely_extract_tar(tarf, temp_dir) 318 except Exception as e: 319 warn(e) 320 return False, f"Failed to extract plugin '{self.name}'." 321 322 ### search for version information 323 files = os.listdir(temp_dir) 324 325 if str(files[0]) == self.name: 326 is_dir = True 327 elif str(files[0]) == self.name + '.py': 328 is_dir = False 329 else: 330 error(f"Unknown format encountered for plugin '{self}'.") 331 332 fpath = temp_dir / files[0] 333 if is_dir: 334 fpath = fpath / '__init__.py' 335 336 init_venv(self.name, debug=debug) 337 with open(fpath, 'r', encoding='utf-8') as f: 338 init_lines = f.readlines() 339 new_version = None 340 for line in init_lines: 341 if '__version__' not in line: 342 continue 343 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 344 if not version_match: 345 continue 346 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 347 break 348 if not new_version: 349 warn( 350 f"No `__version__` defined for plugin '{self}'. " 351 + "Assuming new version...", 352 stack = False, 353 ) 354 355 packaging_version = attempt_import('packaging.version') 356 try: 357 is_new_version = (not new_version and not old_version) or ( 358 packaging_version.parse(old_version) < packaging_version.parse(new_version) 359 ) 360 is_same_version = new_version and old_version and ( 361 packaging_version.parse(old_version) == packaging_version.parse(new_version) 362 ) 363 except Exception: 364 is_new_version, is_same_version = True, False 365 366 ### Determine where to permanently store the new plugin. 367 plugin_installation_dir_path = PLUGINS_DIR_PATHS[0] 368 for path in PLUGINS_DIR_PATHS: 369 if not path.exists(): 370 warn(f"Plugins path does not exist: {path}", stack=False) 371 continue 372 373 files_in_plugins_dir = os.listdir(path) 374 if ( 375 self.name in files_in_plugins_dir 376 or 377 (self.name + '.py') in files_in_plugins_dir 378 ): 379 plugin_installation_dir_path = path 380 break 381 382 success_msg = ( 383 f"Successfully installed plugin '{self}'" 384 + ("\n (skipped dependencies)" if skip_deps else "") 385 + "." 386 ) 387 success, abort = None, None 388 389 if is_same_version and not force: 390 success, msg = True, ( 391 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 392 " Install again with `-f` or `--force` to reinstall." 393 ) 394 abort = True 395 elif is_new_version or force: 396 for src_dir, dirs, files in os.walk(temp_dir): 397 if success is not None: 398 break 399 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 400 if not os.path.exists(dst_dir): 401 os.mkdir(dst_dir) 402 for f in files: 403 src_file = os.path.join(src_dir, f) 404 dst_file = os.path.join(dst_dir, f) 405 if os.path.exists(dst_file): 406 os.remove(dst_file) 407 408 if debug: 409 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 410 try: 411 shutil.move(src_file, dst_dir) 412 except Exception: 413 success, msg = False, ( 414 f"Failed to install plugin '{self}': " + 415 f"Could not move file '{src_file}' to '{dst_dir}'" 416 ) 417 print(msg) 418 break 419 if success is None: 420 success, msg = True, success_msg 421 else: 422 success, msg = False, ( 423 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 424 + f"attempted version {new_version}." 425 ) 426 427 shutil.rmtree(temp_dir) 428 os.chdir(old_cwd) 429 430 ### Reload the plugin's module. 431 sync_plugins_symlinks(debug=debug) 432 if '_module' in self.__dict__: 433 del self.__dict__['_module'] 434 init_venv(venv=self.name, force=True, debug=debug) 435 reload_meerschaum(debug=debug) 436 437 ### if we've already failed, return here 438 if not success or abort: 439 _ongoing_installations.remove(self.full_name) 440 return success, msg 441 442 ### attempt to install dependencies 443 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 444 if not dependencies_installed: 445 _ongoing_installations.remove(self.full_name) 446 return False, f"Failed to install dependencies for plugin '{self}'." 447 448 ### handling success tuple, bool, or other (typically None) 449 setup_tuple = self.setup(debug=debug) 450 if isinstance(setup_tuple, tuple): 451 if not setup_tuple[0]: 452 success, msg = setup_tuple 453 elif isinstance(setup_tuple, bool): 454 if not setup_tuple: 455 success, msg = False, ( 456 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 457 f"Check `setup()` in '{self.__file__}' for more information " + 458 "(no error message provided)." 459 ) 460 else: 461 success, msg = True, success_msg 462 elif setup_tuple is None: 463 success = True 464 msg = ( 465 f"Post-install for plugin '{self}' returned None. " + 466 "Assuming plugin successfully installed." 467 ) 468 warn(msg) 469 else: 470 success = False 471 msg = ( 472 f"Post-install for plugin '{self}' returned unexpected value " + 473 f"of type '{type(setup_tuple)}': {setup_tuple}" 474 ) 475 476 _ongoing_installations.remove(self.full_name) 477 _ = self.module 478 return success, msg 479 480 481 def remove_archive( 482 self, 483 debug: bool = False 484 ) -> SuccessTuple: 485 """Remove a plugin's archive file.""" 486 if not self.archive_path.exists(): 487 return True, f"Archive file for plugin '{self}' does not exist." 488 try: 489 self.archive_path.unlink() 490 except Exception as e: 491 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 492 return True, "Success" 493 494 495 def remove_venv( 496 self, 497 debug: bool = False 498 ) -> SuccessTuple: 499 """Remove a plugin's virtual environment.""" 500 if not self.venv_path.exists(): 501 return True, f"Virtual environment for plugin '{self}' does not exist." 502 try: 503 shutil.rmtree(self.venv_path) 504 except Exception as e: 505 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 506 return True, "Success" 507 508 509 def uninstall(self, debug: bool = False) -> SuccessTuple: 510 """ 511 Remove a plugin, its virtual environment, and archive file. 512 """ 513 from meerschaum.utils.packages import reload_meerschaum 514 from meerschaum.plugins import sync_plugins_symlinks 515 from meerschaum.utils.warnings import warn, info 516 warnings_thrown_count: int = 0 517 max_warnings: int = 3 518 519 if not self.is_installed(): 520 info( 521 f"Plugin '{self.name}' doesn't seem to be installed.\n " 522 + "Checking for artifacts...", 523 stack = False, 524 ) 525 else: 526 real_path = pathlib.Path(os.path.realpath(self.__file__)) 527 try: 528 if real_path.name == '__init__.py': 529 shutil.rmtree(real_path.parent) 530 else: 531 real_path.unlink() 532 except Exception as e: 533 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 534 warnings_thrown_count += 1 535 else: 536 info(f"Removed source files for plugin '{self.name}'.") 537 538 if self.venv_path.exists(): 539 success, msg = self.remove_venv(debug=debug) 540 if not success: 541 warn(msg, stack=False) 542 warnings_thrown_count += 1 543 else: 544 info(f"Removed virtual environment from plugin '{self.name}'.") 545 546 success = warnings_thrown_count < max_warnings 547 sync_plugins_symlinks(debug=debug) 548 self.deactivate_venv(force=True, debug=debug) 549 reload_meerschaum(debug=debug) 550 return success, ( 551 f"Successfully uninstalled plugin '{self}'." if success 552 else f"Failed to uninstall plugin '{self}'." 553 ) 554 555 556 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 557 """ 558 If exists, run the plugin's `setup()` function. 559 560 Parameters 561 ---------- 562 *args: str 563 The positional arguments passed to the `setup()` function. 564 565 debug: bool, default False 566 Verbosity toggle. 567 568 **kw: Any 569 The keyword arguments passed to the `setup()` function. 570 571 Returns 572 ------- 573 A `SuccessTuple` or `bool` indicating success. 574 575 """ 576 from meerschaum.utils.debug import dprint 577 import inspect 578 _setup = None 579 for name, fp in inspect.getmembers(self.module): 580 if name == 'setup' and inspect.isfunction(fp): 581 _setup = fp 582 break 583 584 ### assume success if no setup() is found (not necessary) 585 if _setup is None: 586 return True 587 588 sig = inspect.signature(_setup) 589 has_debug, has_kw = ('debug' in sig.parameters), False 590 for k, v in sig.parameters.items(): 591 if '**' in str(v): 592 has_kw = True 593 break 594 595 _kw = {} 596 if has_kw: 597 _kw.update(kw) 598 if has_debug: 599 _kw['debug'] = debug 600 601 if debug: 602 dprint(f"Running setup for plugin '{self}'...") 603 try: 604 self.activate_venv(debug=debug) 605 return_tuple = _setup(*args, **_kw) 606 self.deactivate_venv(debug=debug) 607 except Exception as e: 608 return False, str(e) 609 610 if isinstance(return_tuple, tuple): 611 return return_tuple 612 if isinstance(return_tuple, bool): 613 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 614 if return_tuple is None: 615 return False, f"Setup for Plugin '{self.name}' returned None." 616 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}" 617 618 619 def get_dependencies( 620 self, 621 debug: bool = False, 622 ) -> List[str]: 623 """ 624 If the Plugin has specified dependencies in a list called `required`, return the list. 625 626 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 627 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 628 629 Parameters 630 ---------- 631 debug: bool, default False 632 Verbosity toggle. 633 634 Returns 635 ------- 636 A list of required packages and plugins (str). 637 638 """ 639 if '_required' in self.__dict__: 640 return self._required 641 642 ### If the plugin has not yet been imported, 643 ### infer the dependencies from the source text. 644 ### This is not super robust, and it doesn't feel right 645 ### having multiple versions of the logic. 646 ### This is necessary when determining the activation order 647 ### without having import the module. 648 ### For consistency's sake, the module-less method does not cache the requirements. 649 if self.__dict__.get('_module', None) is None: 650 file_path = self.__file__ 651 if file_path is None: 652 return [] 653 with open(file_path, 'r', encoding='utf-8') as f: 654 text = f.read() 655 656 if 'required' not in text: 657 return [] 658 659 ### This has some limitations: 660 ### It relies on `required` being manually declared. 661 ### We lose the ability to dynamically alter the `required` list, 662 ### which is why we've kept the module-reliant method below. 663 import ast, re 664 ### NOTE: This technically would break 665 ### if `required` was the very first line of the file. 666 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 667 if not req_start_match: 668 return [] 669 req_start = req_start_match.start() 670 equals_sign = req_start + text[req_start:].find('=') 671 672 ### Dependencies may have brackets within the strings, so push back the index. 673 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 674 if first_opening_brace == -1: 675 return [] 676 677 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 678 if next_closing_brace == -1: 679 return [] 680 681 start_ix = first_opening_brace + 1 682 end_ix = next_closing_brace 683 684 num_braces = 0 685 while True: 686 if '[' not in text[start_ix:end_ix]: 687 break 688 num_braces += 1 689 start_ix = end_ix 690 end_ix += text[end_ix + 1:].find(']') + 1 691 692 req_end = end_ix + 1 693 req_text = ( 694 text[(first_opening_brace-1):req_end] 695 .lstrip() 696 .replace('=', '', 1) 697 .lstrip() 698 .rstrip() 699 ) 700 try: 701 required = ast.literal_eval(req_text) 702 except Exception as e: 703 warn( 704 f"Unable to determine requirements for plugin '{self.name}' " 705 + "without importing the module.\n" 706 + " This may be due to dynamically setting the global `required` list.\n" 707 + f" {e}" 708 ) 709 return [] 710 return required 711 712 import inspect 713 self.activate_venv(dependencies=False, debug=debug) 714 required = [] 715 for name, val in inspect.getmembers(self.module): 716 if name == 'required': 717 required = val 718 break 719 self._required = required 720 self.deactivate_venv(dependencies=False, debug=debug) 721 return required 722 723 724 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 725 """ 726 Return a list of required Plugin objects. 727 """ 728 from meerschaum.utils.warnings import warn 729 from meerschaum.config import get_config 730 from meerschaum._internal.static import STATIC_CONFIG 731 from meerschaum.connectors.parse import is_valid_connector_keys 732 plugins = [] 733 _deps = self.get_dependencies(debug=debug) 734 sep = STATIC_CONFIG['plugins']['repo_separator'] 735 plugin_names = [ 736 _d[len('plugin:'):] for _d in _deps 737 if _d.startswith('plugin:') and len(_d) > len('plugin:') 738 ] 739 default_repo_keys = get_config('meerschaum', 'repository') 740 skipped_repo_keys = set() 741 742 for _plugin_name in plugin_names: 743 if sep in _plugin_name: 744 try: 745 _plugin_name, _repo_keys = _plugin_name.split(sep) 746 except Exception: 747 _repo_keys = default_repo_keys 748 warn( 749 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 750 + f"Will try to use '{_repo_keys}' instead.", 751 stack = False, 752 ) 753 else: 754 _repo_keys = default_repo_keys 755 756 if _repo_keys in skipped_repo_keys: 757 continue 758 759 if not is_valid_connector_keys(_repo_keys): 760 warn( 761 f"Invalid connector '{_repo_keys}'.\n" 762 f" Skipping required plugins from repository '{_repo_keys}'", 763 stack=False, 764 ) 765 continue 766 767 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 768 769 return plugins 770 771 772 def get_required_packages(self, debug: bool=False) -> List[str]: 773 """ 774 Return the required package names (excluding plugins). 775 """ 776 _deps = self.get_dependencies(debug=debug) 777 return [_d for _d in _deps if not _d.startswith('plugin:')] 778 779 780 def activate_venv( 781 self, 782 dependencies: bool = True, 783 init_if_not_exists: bool = True, 784 debug: bool = False, 785 **kw 786 ) -> bool: 787 """ 788 Activate the virtual environments for the plugin and its dependencies. 789 790 Parameters 791 ---------- 792 dependencies: bool, default True 793 If `True`, activate the virtual environments for required plugins. 794 795 Returns 796 ------- 797 A bool indicating success. 798 """ 799 from meerschaum.utils.venv import venv_target_path 800 from meerschaum.utils.packages import activate_venv 801 from meerschaum.utils.misc import make_symlink, is_symlink 802 from meerschaum.config._paths import PACKAGE_ROOT_PATH 803 804 if dependencies: 805 for plugin in self.get_required_plugins(debug=debug): 806 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 807 808 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 809 venv_meerschaum_path = vtp / 'meerschaum' 810 811 try: 812 success, msg = True, "Success" 813 if is_symlink(venv_meerschaum_path): 814 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH: 815 venv_meerschaum_path.unlink() 816 success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH) 817 except Exception as e: 818 success, msg = False, str(e) 819 if not success: 820 warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}") 821 822 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw) 823 824 825 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 826 """ 827 Deactivate the virtual environments for the plugin and its dependencies. 828 829 Parameters 830 ---------- 831 dependencies: bool, default True 832 If `True`, deactivate the virtual environments for required plugins. 833 834 Returns 835 ------- 836 A bool indicating success. 837 """ 838 from meerschaum.utils.packages import deactivate_venv 839 success = deactivate_venv(self.name, debug=debug, **kw) 840 if dependencies: 841 for plugin in self.get_required_plugins(debug=debug): 842 plugin.deactivate_venv(debug=debug, **kw) 843 return success 844 845 846 def install_dependencies( 847 self, 848 force: bool = False, 849 debug: bool = False, 850 ) -> bool: 851 """ 852 If specified, install dependencies. 853 854 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 855 Meerschaum plugins from the same repository as this Plugin. 856 To install from a different repository, add the repo keys after `'@'` 857 (e.g. `'plugin:foo@api:bar'`). 858 859 Parameters 860 ---------- 861 force: bool, default False 862 If `True`, continue with the installation, even if some 863 required packages fail to install. 864 865 debug: bool, default False 866 Verbosity toggle. 867 868 Returns 869 ------- 870 A bool indicating success. 871 """ 872 from meerschaum.utils.packages import pip_install, venv_contains_package 873 from meerschaum.utils.warnings import warn, info 874 _deps = self.get_dependencies(debug=debug) 875 if not _deps and self.requirements_file_path is None: 876 return True 877 878 plugins = self.get_required_plugins(debug=debug) 879 for _plugin in plugins: 880 if _plugin.name == self.name: 881 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 882 continue 883 _success, _msg = _plugin.repo_connector.install_plugin( 884 _plugin.name, debug=debug, force=force 885 ) 886 if not _success: 887 warn( 888 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 889 + f" for plugin '{self.name}':\n" + _msg, 890 stack = False, 891 ) 892 if not force: 893 warn( 894 "Try installing with the `--force` flag to continue anyway.", 895 stack = False, 896 ) 897 return False 898 info( 899 "Continuing with installation despite the failure " 900 + "(careful, things might be broken!)...", 901 icon = False 902 ) 903 904 905 ### First step: parse `requirements.txt` if it exists. 906 if self.requirements_file_path is not None: 907 if not pip_install( 908 requirements_file_path=self.requirements_file_path, 909 venv=self.name, debug=debug 910 ): 911 warn( 912 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 913 stack = False, 914 ) 915 if not force: 916 warn( 917 "Try installing with `--force` to continue anyway.", 918 stack = False, 919 ) 920 return False 921 info( 922 "Continuing with installation despite the failure " 923 + "(careful, things might be broken!)...", 924 icon = False 925 ) 926 927 928 ### Don't reinstall packages that are already included in required plugins. 929 packages = [] 930 _packages = self.get_required_packages(debug=debug) 931 accounted_for_packages = set() 932 for package_name in _packages: 933 for plugin in plugins: 934 if venv_contains_package(package_name, plugin.name): 935 accounted_for_packages.add(package_name) 936 break 937 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 938 939 ### Attempt pip packages installation. 940 if packages: 941 for package in packages: 942 if not pip_install(package, venv=self.name, debug=debug): 943 warn( 944 f"Failed to install required package '{package}'" 945 + f" for plugin '{self.name}'.", 946 stack = False, 947 ) 948 if not force: 949 warn( 950 "Try installing with `--force` to continue anyway.", 951 stack = False, 952 ) 953 return False 954 info( 955 "Continuing with installation despite the failure " 956 + "(careful, things might be broken!)...", 957 icon = False 958 ) 959 return True 960 961 962 @property 963 def full_name(self) -> str: 964 """ 965 Include the repo keys with the plugin's name. 966 """ 967 from meerschaum._internal.static import STATIC_CONFIG 968 sep = STATIC_CONFIG['plugins']['repo_separator'] 969 return self.name + sep + str(self.repo_connector) 970 971 972 def __str__(self): 973 return self.name 974 975 976 def __repr__(self): 977 return f"Plugin('{self.name}', repo='{self.repo_connector}')" 978 979 980 def __del__(self): 981 pass
Handle packaging of Meerschaum plugins.
33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 from meerschaum._internal.static import STATIC_CONFIG 46 from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo
76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector
Return the repository connector for this plugin.
NOTE: This imports the connectors module, which imports certain plugin modules.
95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version
Return the plugin's module version is defined (__version__) if it's defined.
108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module
Return the Python module of the underlying plugin.
148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path
If a file named requirements.txt exists, return its path.
161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None
Check whether a plugin is correctly installed.
Returns
- A
boolindicating whether a plugin exists and is successfully imported.
172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path
Compress the plugin's source files into a .tar.gz archive and return the archive's path.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
pathlib.Pathto the archive file's path.
257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 from meerschaum.utils.warnings import warn, error 289 if debug: 290 from meerschaum.utils.debug import dprint 291 import tarfile 292 import re 293 import ast 294 from meerschaum.plugins import sync_plugins_symlinks 295 from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum 296 from meerschaum.utils.venv import init_venv 297 from meerschaum.utils.misc import safely_extract_tar 298 from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS 299 old_cwd = os.getcwd() 300 old_version = '' 301 new_version = '' 302 temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name 303 temp_dir.mkdir(exist_ok=True) 304 305 if not self.archive_path.exists(): 306 return False, f"Missing archive file for plugin '{self}'." 307 if self.version is not None: 308 old_version = self.version 309 if debug: 310 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 311 312 if debug: 313 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 314 315 try: 316 with tarfile.open(self.archive_path, 'r:gz') as tarf: 317 safely_extract_tar(tarf, temp_dir) 318 except Exception as e: 319 warn(e) 320 return False, f"Failed to extract plugin '{self.name}'." 321 322 ### search for version information 323 files = os.listdir(temp_dir) 324 325 if str(files[0]) == self.name: 326 is_dir = True 327 elif str(files[0]) == self.name + '.py': 328 is_dir = False 329 else: 330 error(f"Unknown format encountered for plugin '{self}'.") 331 332 fpath = temp_dir / files[0] 333 if is_dir: 334 fpath = fpath / '__init__.py' 335 336 init_venv(self.name, debug=debug) 337 with open(fpath, 'r', encoding='utf-8') as f: 338 init_lines = f.readlines() 339 new_version = None 340 for line in init_lines: 341 if '__version__' not in line: 342 continue 343 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 344 if not version_match: 345 continue 346 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 347 break 348 if not new_version: 349 warn( 350 f"No `__version__` defined for plugin '{self}'. " 351 + "Assuming new version...", 352 stack = False, 353 ) 354 355 packaging_version = attempt_import('packaging.version') 356 try: 357 is_new_version = (not new_version and not old_version) or ( 358 packaging_version.parse(old_version) < packaging_version.parse(new_version) 359 ) 360 is_same_version = new_version and old_version and ( 361 packaging_version.parse(old_version) == packaging_version.parse(new_version) 362 ) 363 except Exception: 364 is_new_version, is_same_version = True, False 365 366 ### Determine where to permanently store the new plugin. 367 plugin_installation_dir_path = PLUGINS_DIR_PATHS[0] 368 for path in PLUGINS_DIR_PATHS: 369 if not path.exists(): 370 warn(f"Plugins path does not exist: {path}", stack=False) 371 continue 372 373 files_in_plugins_dir = os.listdir(path) 374 if ( 375 self.name in files_in_plugins_dir 376 or 377 (self.name + '.py') in files_in_plugins_dir 378 ): 379 plugin_installation_dir_path = path 380 break 381 382 success_msg = ( 383 f"Successfully installed plugin '{self}'" 384 + ("\n (skipped dependencies)" if skip_deps else "") 385 + "." 386 ) 387 success, abort = None, None 388 389 if is_same_version and not force: 390 success, msg = True, ( 391 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 392 " Install again with `-f` or `--force` to reinstall." 393 ) 394 abort = True 395 elif is_new_version or force: 396 for src_dir, dirs, files in os.walk(temp_dir): 397 if success is not None: 398 break 399 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 400 if not os.path.exists(dst_dir): 401 os.mkdir(dst_dir) 402 for f in files: 403 src_file = os.path.join(src_dir, f) 404 dst_file = os.path.join(dst_dir, f) 405 if os.path.exists(dst_file): 406 os.remove(dst_file) 407 408 if debug: 409 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 410 try: 411 shutil.move(src_file, dst_dir) 412 except Exception: 413 success, msg = False, ( 414 f"Failed to install plugin '{self}': " + 415 f"Could not move file '{src_file}' to '{dst_dir}'" 416 ) 417 print(msg) 418 break 419 if success is None: 420 success, msg = True, success_msg 421 else: 422 success, msg = False, ( 423 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 424 + f"attempted version {new_version}." 425 ) 426 427 shutil.rmtree(temp_dir) 428 os.chdir(old_cwd) 429 430 ### Reload the plugin's module. 431 sync_plugins_symlinks(debug=debug) 432 if '_module' in self.__dict__: 433 del self.__dict__['_module'] 434 init_venv(venv=self.name, force=True, debug=debug) 435 reload_meerschaum(debug=debug) 436 437 ### if we've already failed, return here 438 if not success or abort: 439 _ongoing_installations.remove(self.full_name) 440 return success, msg 441 442 ### attempt to install dependencies 443 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 444 if not dependencies_installed: 445 _ongoing_installations.remove(self.full_name) 446 return False, f"Failed to install dependencies for plugin '{self}'." 447 448 ### handling success tuple, bool, or other (typically None) 449 setup_tuple = self.setup(debug=debug) 450 if isinstance(setup_tuple, tuple): 451 if not setup_tuple[0]: 452 success, msg = setup_tuple 453 elif isinstance(setup_tuple, bool): 454 if not setup_tuple: 455 success, msg = False, ( 456 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 457 f"Check `setup()` in '{self.__file__}' for more information " + 458 "(no error message provided)." 459 ) 460 else: 461 success, msg = True, success_msg 462 elif setup_tuple is None: 463 success = True 464 msg = ( 465 f"Post-install for plugin '{self}' returned None. " + 466 "Assuming plugin successfully installed." 467 ) 468 warn(msg) 469 else: 470 success = False 471 msg = ( 472 f"Post-install for plugin '{self}' returned unexpected value " + 473 f"of type '{type(setup_tuple)}': {setup_tuple}" 474 ) 475 476 _ongoing_installations.remove(self.full_name) 477 _ = self.module 478 return success, msg
Extract a plugin's tar archive to the plugins directory.
This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.
Parameters
- skip_deps (bool, default False):
If
True, do not install dependencies. - force (bool, default False):
If
True, continue with installation, even if required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool) and a message (str).
481 def remove_archive( 482 self, 483 debug: bool = False 484 ) -> SuccessTuple: 485 """Remove a plugin's archive file.""" 486 if not self.archive_path.exists(): 487 return True, f"Archive file for plugin '{self}' does not exist." 488 try: 489 self.archive_path.unlink() 490 except Exception as e: 491 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 492 return True, "Success"
Remove a plugin's archive file.
495 def remove_venv( 496 self, 497 debug: bool = False 498 ) -> SuccessTuple: 499 """Remove a plugin's virtual environment.""" 500 if not self.venv_path.exists(): 501 return True, f"Virtual environment for plugin '{self}' does not exist." 502 try: 503 shutil.rmtree(self.venv_path) 504 except Exception as e: 505 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 506 return True, "Success"
Remove a plugin's virtual environment.
509 def uninstall(self, debug: bool = False) -> SuccessTuple: 510 """ 511 Remove a plugin, its virtual environment, and archive file. 512 """ 513 from meerschaum.utils.packages import reload_meerschaum 514 from meerschaum.plugins import sync_plugins_symlinks 515 from meerschaum.utils.warnings import warn, info 516 warnings_thrown_count: int = 0 517 max_warnings: int = 3 518 519 if not self.is_installed(): 520 info( 521 f"Plugin '{self.name}' doesn't seem to be installed.\n " 522 + "Checking for artifacts...", 523 stack = False, 524 ) 525 else: 526 real_path = pathlib.Path(os.path.realpath(self.__file__)) 527 try: 528 if real_path.name == '__init__.py': 529 shutil.rmtree(real_path.parent) 530 else: 531 real_path.unlink() 532 except Exception as e: 533 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 534 warnings_thrown_count += 1 535 else: 536 info(f"Removed source files for plugin '{self.name}'.") 537 538 if self.venv_path.exists(): 539 success, msg = self.remove_venv(debug=debug) 540 if not success: 541 warn(msg, stack=False) 542 warnings_thrown_count += 1 543 else: 544 info(f"Removed virtual environment from plugin '{self.name}'.") 545 546 success = warnings_thrown_count < max_warnings 547 sync_plugins_symlinks(debug=debug) 548 self.deactivate_venv(force=True, debug=debug) 549 reload_meerschaum(debug=debug) 550 return success, ( 551 f"Successfully uninstalled plugin '{self}'." if success 552 else f"Failed to uninstall plugin '{self}'." 553 )
Remove a plugin, its virtual environment, and archive file.
556 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 557 """ 558 If exists, run the plugin's `setup()` function. 559 560 Parameters 561 ---------- 562 *args: str 563 The positional arguments passed to the `setup()` function. 564 565 debug: bool, default False 566 Verbosity toggle. 567 568 **kw: Any 569 The keyword arguments passed to the `setup()` function. 570 571 Returns 572 ------- 573 A `SuccessTuple` or `bool` indicating success. 574 575 """ 576 from meerschaum.utils.debug import dprint 577 import inspect 578 _setup = None 579 for name, fp in inspect.getmembers(self.module): 580 if name == 'setup' and inspect.isfunction(fp): 581 _setup = fp 582 break 583 584 ### assume success if no setup() is found (not necessary) 585 if _setup is None: 586 return True 587 588 sig = inspect.signature(_setup) 589 has_debug, has_kw = ('debug' in sig.parameters), False 590 for k, v in sig.parameters.items(): 591 if '**' in str(v): 592 has_kw = True 593 break 594 595 _kw = {} 596 if has_kw: 597 _kw.update(kw) 598 if has_debug: 599 _kw['debug'] = debug 600 601 if debug: 602 dprint(f"Running setup for plugin '{self}'...") 603 try: 604 self.activate_venv(debug=debug) 605 return_tuple = _setup(*args, **_kw) 606 self.deactivate_venv(debug=debug) 607 except Exception as e: 608 return False, str(e) 609 610 if isinstance(return_tuple, tuple): 611 return return_tuple 612 if isinstance(return_tuple, bool): 613 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 614 if return_tuple is None: 615 return False, f"Setup for Plugin '{self.name}' returned None." 616 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
If exists, run the plugin's setup() function.
Parameters
- *args (str):
The positional arguments passed to the
setup()function. - debug (bool, default False): Verbosity toggle.
- **kw (Any):
The keyword arguments passed to the
setup()function.
Returns
- A
SuccessTupleorboolindicating success.
619 def get_dependencies( 620 self, 621 debug: bool = False, 622 ) -> List[str]: 623 """ 624 If the Plugin has specified dependencies in a list called `required`, return the list. 625 626 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 627 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 628 629 Parameters 630 ---------- 631 debug: bool, default False 632 Verbosity toggle. 633 634 Returns 635 ------- 636 A list of required packages and plugins (str). 637 638 """ 639 if '_required' in self.__dict__: 640 return self._required 641 642 ### If the plugin has not yet been imported, 643 ### infer the dependencies from the source text. 644 ### This is not super robust, and it doesn't feel right 645 ### having multiple versions of the logic. 646 ### This is necessary when determining the activation order 647 ### without having import the module. 648 ### For consistency's sake, the module-less method does not cache the requirements. 649 if self.__dict__.get('_module', None) is None: 650 file_path = self.__file__ 651 if file_path is None: 652 return [] 653 with open(file_path, 'r', encoding='utf-8') as f: 654 text = f.read() 655 656 if 'required' not in text: 657 return [] 658 659 ### This has some limitations: 660 ### It relies on `required` being manually declared. 661 ### We lose the ability to dynamically alter the `required` list, 662 ### which is why we've kept the module-reliant method below. 663 import ast, re 664 ### NOTE: This technically would break 665 ### if `required` was the very first line of the file. 666 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 667 if not req_start_match: 668 return [] 669 req_start = req_start_match.start() 670 equals_sign = req_start + text[req_start:].find('=') 671 672 ### Dependencies may have brackets within the strings, so push back the index. 673 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 674 if first_opening_brace == -1: 675 return [] 676 677 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 678 if next_closing_brace == -1: 679 return [] 680 681 start_ix = first_opening_brace + 1 682 end_ix = next_closing_brace 683 684 num_braces = 0 685 while True: 686 if '[' not in text[start_ix:end_ix]: 687 break 688 num_braces += 1 689 start_ix = end_ix 690 end_ix += text[end_ix + 1:].find(']') + 1 691 692 req_end = end_ix + 1 693 req_text = ( 694 text[(first_opening_brace-1):req_end] 695 .lstrip() 696 .replace('=', '', 1) 697 .lstrip() 698 .rstrip() 699 ) 700 try: 701 required = ast.literal_eval(req_text) 702 except Exception as e: 703 warn( 704 f"Unable to determine requirements for plugin '{self.name}' " 705 + "without importing the module.\n" 706 + " This may be due to dynamically setting the global `required` list.\n" 707 + f" {e}" 708 ) 709 return [] 710 return required 711 712 import inspect 713 self.activate_venv(dependencies=False, debug=debug) 714 required = [] 715 for name, val in inspect.getmembers(self.module): 716 if name == 'required': 717 required = val 718 break 719 self._required = required 720 self.deactivate_venv(dependencies=False, debug=debug) 721 return required
If the Plugin has specified dependencies in a list called required, return the list.
NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages.
Meerschaum plugins may also specify connector keys for a repo after '@'.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A list of required packages and plugins (str).
724 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 725 """ 726 Return a list of required Plugin objects. 727 """ 728 from meerschaum.utils.warnings import warn 729 from meerschaum.config import get_config 730 from meerschaum._internal.static import STATIC_CONFIG 731 from meerschaum.connectors.parse import is_valid_connector_keys 732 plugins = [] 733 _deps = self.get_dependencies(debug=debug) 734 sep = STATIC_CONFIG['plugins']['repo_separator'] 735 plugin_names = [ 736 _d[len('plugin:'):] for _d in _deps 737 if _d.startswith('plugin:') and len(_d) > len('plugin:') 738 ] 739 default_repo_keys = get_config('meerschaum', 'repository') 740 skipped_repo_keys = set() 741 742 for _plugin_name in plugin_names: 743 if sep in _plugin_name: 744 try: 745 _plugin_name, _repo_keys = _plugin_name.split(sep) 746 except Exception: 747 _repo_keys = default_repo_keys 748 warn( 749 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 750 + f"Will try to use '{_repo_keys}' instead.", 751 stack = False, 752 ) 753 else: 754 _repo_keys = default_repo_keys 755 756 if _repo_keys in skipped_repo_keys: 757 continue 758 759 if not is_valid_connector_keys(_repo_keys): 760 warn( 761 f"Invalid connector '{_repo_keys}'.\n" 762 f" Skipping required plugins from repository '{_repo_keys}'", 763 stack=False, 764 ) 765 continue 766 767 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 768 769 return plugins
Return a list of required Plugin objects.
772 def get_required_packages(self, debug: bool=False) -> List[str]: 773 """ 774 Return the required package names (excluding plugins). 775 """ 776 _deps = self.get_dependencies(debug=debug) 777 return [_d for _d in _deps if not _d.startswith('plugin:')]
Return the required package names (excluding plugins).
780 def activate_venv( 781 self, 782 dependencies: bool = True, 783 init_if_not_exists: bool = True, 784 debug: bool = False, 785 **kw 786 ) -> bool: 787 """ 788 Activate the virtual environments for the plugin and its dependencies. 789 790 Parameters 791 ---------- 792 dependencies: bool, default True 793 If `True`, activate the virtual environments for required plugins. 794 795 Returns 796 ------- 797 A bool indicating success. 798 """ 799 from meerschaum.utils.venv import venv_target_path 800 from meerschaum.utils.packages import activate_venv 801 from meerschaum.utils.misc import make_symlink, is_symlink 802 from meerschaum.config._paths import PACKAGE_ROOT_PATH 803 804 if dependencies: 805 for plugin in self.get_required_plugins(debug=debug): 806 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 807 808 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 809 venv_meerschaum_path = vtp / 'meerschaum' 810 811 try: 812 success, msg = True, "Success" 813 if is_symlink(venv_meerschaum_path): 814 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH: 815 venv_meerschaum_path.unlink() 816 success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH) 817 except Exception as e: 818 success, msg = False, str(e) 819 if not success: 820 warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}") 821 822 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
Activate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, activate the virtual environments for required plugins.
Returns
- A bool indicating success.
825 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 826 """ 827 Deactivate the virtual environments for the plugin and its dependencies. 828 829 Parameters 830 ---------- 831 dependencies: bool, default True 832 If `True`, deactivate the virtual environments for required plugins. 833 834 Returns 835 ------- 836 A bool indicating success. 837 """ 838 from meerschaum.utils.packages import deactivate_venv 839 success = deactivate_venv(self.name, debug=debug, **kw) 840 if dependencies: 841 for plugin in self.get_required_plugins(debug=debug): 842 plugin.deactivate_venv(debug=debug, **kw) 843 return success
Deactivate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, deactivate the virtual environments for required plugins.
Returns
- A bool indicating success.
846 def install_dependencies( 847 self, 848 force: bool = False, 849 debug: bool = False, 850 ) -> bool: 851 """ 852 If specified, install dependencies. 853 854 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 855 Meerschaum plugins from the same repository as this Plugin. 856 To install from a different repository, add the repo keys after `'@'` 857 (e.g. `'plugin:foo@api:bar'`). 858 859 Parameters 860 ---------- 861 force: bool, default False 862 If `True`, continue with the installation, even if some 863 required packages fail to install. 864 865 debug: bool, default False 866 Verbosity toggle. 867 868 Returns 869 ------- 870 A bool indicating success. 871 """ 872 from meerschaum.utils.packages import pip_install, venv_contains_package 873 from meerschaum.utils.warnings import warn, info 874 _deps = self.get_dependencies(debug=debug) 875 if not _deps and self.requirements_file_path is None: 876 return True 877 878 plugins = self.get_required_plugins(debug=debug) 879 for _plugin in plugins: 880 if _plugin.name == self.name: 881 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 882 continue 883 _success, _msg = _plugin.repo_connector.install_plugin( 884 _plugin.name, debug=debug, force=force 885 ) 886 if not _success: 887 warn( 888 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 889 + f" for plugin '{self.name}':\n" + _msg, 890 stack = False, 891 ) 892 if not force: 893 warn( 894 "Try installing with the `--force` flag to continue anyway.", 895 stack = False, 896 ) 897 return False 898 info( 899 "Continuing with installation despite the failure " 900 + "(careful, things might be broken!)...", 901 icon = False 902 ) 903 904 905 ### First step: parse `requirements.txt` if it exists. 906 if self.requirements_file_path is not None: 907 if not pip_install( 908 requirements_file_path=self.requirements_file_path, 909 venv=self.name, debug=debug 910 ): 911 warn( 912 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 913 stack = False, 914 ) 915 if not force: 916 warn( 917 "Try installing with `--force` to continue anyway.", 918 stack = False, 919 ) 920 return False 921 info( 922 "Continuing with installation despite the failure " 923 + "(careful, things might be broken!)...", 924 icon = False 925 ) 926 927 928 ### Don't reinstall packages that are already included in required plugins. 929 packages = [] 930 _packages = self.get_required_packages(debug=debug) 931 accounted_for_packages = set() 932 for package_name in _packages: 933 for plugin in plugins: 934 if venv_contains_package(package_name, plugin.name): 935 accounted_for_packages.add(package_name) 936 break 937 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 938 939 ### Attempt pip packages installation. 940 if packages: 941 for package in packages: 942 if not pip_install(package, venv=self.name, debug=debug): 943 warn( 944 f"Failed to install required package '{package}'" 945 + f" for plugin '{self.name}'.", 946 stack = False, 947 ) 948 if not force: 949 warn( 950 "Try installing with `--force` to continue anyway.", 951 stack = False, 952 ) 953 return False 954 info( 955 "Continuing with installation despite the failure " 956 + "(careful, things might be broken!)...", 957 icon = False 958 ) 959 return True
If specified, install dependencies.
NOTE: Dependencies that start with 'plugin:' will be installed as
Meerschaum plugins from the same repository as this Plugin.
To install from a different repository, add the repo keys after '@'
(e.g. 'plugin:foo@api:bar').
Parameters
- force (bool, default False):
If
True, continue with the installation, even if some required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A bool indicating success.
962 @property 963 def full_name(self) -> str: 964 """ 965 Include the repo keys with the plugin's name. 966 """ 967 from meerschaum._internal.static import STATIC_CONFIG 968 sep = STATIC_CONFIG['plugins']['repo_separator'] 969 return self.name + sep + str(self.repo_connector)
Include the repo keys with the plugin's name.
19class Venv: 20 """ 21 Manage a virtual enviroment's activation status. 22 23 Examples 24 -------- 25 >>> from meerschaum.plugins import Plugin 26 >>> with Venv('mrsm') as venv: 27 ... import pandas 28 >>> with Venv(Plugin('noaa')) as venv: 29 ... import requests 30 >>> venv = Venv('mrsm') 31 >>> venv.activate() 32 True 33 >>> venv.deactivate() 34 True 35 >>> 36 """ 37 38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 61 62 63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 ) 86 87 88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs) 95 96 97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug) 106 107 108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 from meerschaum.config._paths import VIRTENV_RESOURCES_PATH 114 if self._venv is None: 115 return self.target_path.parent 116 return VIRTENV_RESOURCES_PATH / self._venv 117 118 119 def __enter__(self) -> None: 120 self.activate(debug=self._debug) 121 122 123 def __exit__(self, exc_type, exc_value, exc_traceback) -> None: 124 self.deactivate(debug=self._debug) 125 126 127 def __str__(self) -> str: 128 quote = "'" if self._venv is not None else "" 129 return "Venv(" + quote + str(self._venv) + quote + ")" 130 131 132 def __repr__(self) -> str: 133 return self.__str__()
Manage a virtual enviroment's activation status.
Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
... import pandas
>>> with Venv(Plugin('noaa')) as venv:
... import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 )
Activate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be activated.
88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs)
Deactivate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be deactivated.
97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
Return the target site-packages path for this virtual environment.
A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version
(e.g. Python 3.10 and Python 3.7).
108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 from meerschaum.config._paths import VIRTENV_RESOURCES_PATH 114 if self._venv is None: 115 return self.target_path.parent 116 return VIRTENV_RESOURCES_PATH / self._venv
Return the top-level path for this virtual environment.
70class Job: 71 """ 72 Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API. 73 """ 74 75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break 202 203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 from meerschaum.config.paths import DAEMON_RESOURCES_PATH 217 218 psutil = mrsm.attempt_import('psutil') 219 try: 220 process = psutil.Process(pid) 221 except psutil.NoSuchProcess as e: 222 warn(f"Process with PID {pid} does not exist.", stack=False) 223 raise e 224 225 command_args = process.cmdline() 226 is_daemon = command_args[1] == '-c' 227 228 if is_daemon: 229 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 230 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 231 if root_dir is None: 232 from meerschaum.config.paths import ROOT_DIR_PATH 233 root_dir = ROOT_DIR_PATH 234 else: 235 root_dir = pathlib.Path(root_dir) 236 jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name 237 daemon_dir = jobs_dir / daemon_id 238 pid_file = daemon_dir / 'process.pid' 239 240 if pid_file.exists(): 241 with open(pid_file, 'r', encoding='utf-8') as f: 242 daemon_pid = int(f.read()) 243 244 if pid != daemon_pid: 245 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 246 else: 247 raise EnvironmentError(f"Is job '{daemon_id}' running?") 248 249 return Job(daemon_id, executor_keys=executor_keys) 250 251 from meerschaum._internal.arguments._parse_arguments import parse_arguments 252 from meerschaum.utils.daemon import get_new_daemon_name 253 254 mrsm_ix = 0 255 for i, arg in enumerate(command_args): 256 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 257 mrsm_ix = i 258 break 259 260 sysargs = command_args[mrsm_ix+1:] 261 kwargs = parse_arguments(sysargs) 262 name = kwargs.get('name', get_new_daemon_name()) 263 return Job(name, sysargs, executor_keys=executor_keys) 264 265 def start(self, debug: bool = False) -> SuccessTuple: 266 """ 267 Start the job's daemon. 268 """ 269 if self.executor is not None: 270 if not self.exists(debug=debug): 271 return self.executor.create_job( 272 self.name, 273 self.sysargs, 274 properties=self.daemon.properties, 275 debug=debug, 276 ) 277 return self.executor.start_job(self.name, debug=debug) 278 279 if self.is_running(): 280 return True, f"{self} is already running." 281 282 success, msg = self.daemon.run( 283 keep_daemon_output=(not self.delete_after_completion), 284 allow_dirty_run=True, 285 ) 286 if not success: 287 return success, msg 288 289 return success, f"Started {self}." 290 291 def stop( 292 self, 293 timeout_seconds: Union[int, float, None] = None, 294 debug: bool = False, 295 ) -> SuccessTuple: 296 """ 297 Stop the job's daemon. 298 """ 299 if self.executor is not None: 300 return self.executor.stop_job(self.name, debug=debug) 301 302 if self.daemon.status == 'stopped': 303 if not self.restart: 304 return True, f"{self} is not running." 305 elif self.stop_time is not None: 306 return True, f"{self} will not restart until manually started." 307 308 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 309 if quit_success: 310 return quit_success, f"Stopped {self}." 311 312 warn( 313 f"Failed to gracefully quit {self}.", 314 stack=False, 315 ) 316 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 317 if not kill_success: 318 return kill_success, kill_msg 319 320 return kill_success, f"Killed {self}." 321 322 def pause( 323 self, 324 timeout_seconds: Union[int, float, None] = None, 325 debug: bool = False, 326 ) -> SuccessTuple: 327 """ 328 Pause the job's daemon. 329 """ 330 if self.executor is not None: 331 return self.executor.pause_job(self.name, debug=debug) 332 333 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 334 if not pause_success: 335 return pause_success, pause_msg 336 337 return pause_success, f"Paused {self}." 338 339 def delete(self, debug: bool = False) -> SuccessTuple: 340 """ 341 Delete the job and its daemon. 342 """ 343 if self.executor is not None: 344 return self.executor.delete_job(self.name, debug=debug) 345 346 if self.is_running(): 347 stop_success, stop_msg = self.stop() 348 if not stop_success: 349 return stop_success, stop_msg 350 351 cleanup_success, cleanup_msg = self.daemon.cleanup() 352 if not cleanup_success: 353 return cleanup_success, cleanup_msg 354 355 _ = self.daemon._properties.pop('result', None) 356 return cleanup_success, f"Deleted {self}." 357 358 def is_running(self) -> bool: 359 """ 360 Determine whether the job's daemon is running. 361 """ 362 return self.status == 'running' 363 364 def exists(self, debug: bool = False) -> bool: 365 """ 366 Determine whether the job exists. 367 """ 368 if self.executor is not None: 369 return self.executor.get_job_exists(self.name, debug=debug) 370 371 return self.daemon.path.exists() 372 373 def get_logs(self) -> Union[str, None]: 374 """ 375 Return the output text of the job's daemon. 376 """ 377 if self.executor is not None: 378 return self.executor.get_logs(self.name) 379 380 return self.daemon.log_text 381 382 def monitor_logs( 383 self, 384 callback_function: Callable[[str], None] = _default_stdout_callback, 385 input_callback_function: Optional[Callable[[], str]] = None, 386 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 387 stop_event: Optional[asyncio.Event] = None, 388 stop_on_exit: bool = False, 389 strip_timestamps: bool = False, 390 accept_input: bool = True, 391 debug: bool = False, 392 _logs_path: Optional[pathlib.Path] = None, 393 _log=None, 394 _stdin_file=None, 395 _wait_if_stopped: bool = True, 396 ): 397 """ 398 Monitor the job's log files and execute a callback on new lines. 399 400 Parameters 401 ---------- 402 callback_function: Callable[[str], None], default partial(print, end='') 403 The callback to execute as new data comes in. 404 Defaults to printing the output directly to `stdout`. 405 406 input_callback_function: Optional[Callable[[], str]], default None 407 If provided, execute this callback when the daemon is blocking on stdin. 408 Defaults to `sys.stdin.readline()`. 409 410 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 411 If provided, execute this callback when the daemon stops. 412 The job's SuccessTuple will be passed to the callback. 413 414 stop_event: Optional[asyncio.Event], default None 415 If provided, stop monitoring when this event is set. 416 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 417 from within `callback_function` to stop monitoring. 418 419 stop_on_exit: bool, default False 420 If `True`, stop monitoring when the job stops. 421 422 strip_timestamps: bool, default False 423 If `True`, remove leading timestamps from lines. 424 425 accept_input: bool, default True 426 If `True`, accept input when the daemon blocks on stdin. 427 """ 428 if self.executor is not None: 429 self.executor.monitor_logs( 430 self.name, 431 callback_function, 432 input_callback_function=input_callback_function, 433 stop_callback_function=stop_callback_function, 434 stop_on_exit=stop_on_exit, 435 accept_input=accept_input, 436 strip_timestamps=strip_timestamps, 437 debug=debug, 438 ) 439 return 440 441 monitor_logs_coroutine = self.monitor_logs_async( 442 callback_function=callback_function, 443 input_callback_function=input_callback_function, 444 stop_callback_function=stop_callback_function, 445 stop_event=stop_event, 446 stop_on_exit=stop_on_exit, 447 strip_timestamps=strip_timestamps, 448 accept_input=accept_input, 449 debug=debug, 450 _logs_path=_logs_path, 451 _log=_log, 452 _stdin_file=_stdin_file, 453 _wait_if_stopped=_wait_if_stopped, 454 ) 455 return asyncio.run(monitor_logs_coroutine) 456 457 async def monitor_logs_async( 458 self, 459 callback_function: Callable[[str], None] = _default_stdout_callback, 460 input_callback_function: Optional[Callable[[], str]] = None, 461 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 462 stop_event: Optional[asyncio.Event] = None, 463 stop_on_exit: bool = False, 464 strip_timestamps: bool = False, 465 accept_input: bool = True, 466 debug: bool = False, 467 _logs_path: Optional[pathlib.Path] = None, 468 _log=None, 469 _stdin_file=None, 470 _wait_if_stopped: bool = True, 471 ): 472 """ 473 Monitor the job's log files and await a callback on new lines. 474 475 Parameters 476 ---------- 477 callback_function: Callable[[str], None], default _default_stdout_callback 478 The callback to execute as new data comes in. 479 Defaults to printing the output directly to `stdout`. 480 481 input_callback_function: Optional[Callable[[], str]], default None 482 If provided, execute this callback when the daemon is blocking on stdin. 483 Defaults to `sys.stdin.readline()`. 484 485 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 486 If provided, execute this callback when the daemon stops. 487 The job's SuccessTuple will be passed to the callback. 488 489 stop_event: Optional[asyncio.Event], default None 490 If provided, stop monitoring when this event is set. 491 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 492 from within `callback_function` to stop monitoring. 493 494 stop_on_exit: bool, default False 495 If `True`, stop monitoring when the job stops. 496 497 strip_timestamps: bool, default False 498 If `True`, remove leading timestamps from lines. 499 500 accept_input: bool, default True 501 If `True`, accept input when the daemon blocks on stdin. 502 """ 503 from meerschaum.utils.prompt import prompt 504 505 def default_input_callback_function(): 506 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 507 if prompt_kwargs: 508 answer = prompt(**prompt_kwargs) 509 return answer + '\n' 510 return sys.stdin.readline() 511 512 if input_callback_function is None: 513 input_callback_function = default_input_callback_function 514 515 if self.executor is not None: 516 await self.executor.monitor_logs_async( 517 self.name, 518 callback_function, 519 input_callback_function=input_callback_function, 520 stop_callback_function=stop_callback_function, 521 stop_on_exit=stop_on_exit, 522 strip_timestamps=strip_timestamps, 523 accept_input=accept_input, 524 debug=debug, 525 ) 526 return 527 528 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 529 530 events = { 531 'user': stop_event, 532 'stopped': asyncio.Event(), 533 'stop_token': asyncio.Event(), 534 'stop_exception': asyncio.Event(), 535 'stopped_timeout': asyncio.Event(), 536 } 537 combined_event = asyncio.Event() 538 emitted_text = False 539 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 540 541 async def check_job_status(): 542 if not stop_on_exit: 543 return 544 545 nonlocal emitted_text 546 547 sleep_time = 0.1 548 while sleep_time < 0.2: 549 if self.status == 'stopped': 550 if not emitted_text and _wait_if_stopped: 551 await asyncio.sleep(sleep_time) 552 sleep_time = round(sleep_time * 1.1, 3) 553 continue 554 555 if stop_callback_function is not None: 556 try: 557 if asyncio.iscoroutinefunction(stop_callback_function): 558 await stop_callback_function(self.result) 559 else: 560 stop_callback_function(self.result) 561 except asyncio.exceptions.CancelledError: 562 break 563 except Exception: 564 warn(traceback.format_exc()) 565 566 if stop_on_exit: 567 events['stopped'].set() 568 569 break 570 await asyncio.sleep(0.1) 571 572 events['stopped_timeout'].set() 573 574 async def check_blocking_on_input(): 575 while True: 576 if not emitted_text or not self.is_blocking_on_stdin(): 577 try: 578 await asyncio.sleep(self.refresh_seconds) 579 except asyncio.exceptions.CancelledError: 580 break 581 continue 582 583 if not self.is_running(): 584 break 585 586 await emit_latest_lines() 587 588 try: 589 print('', end='', flush=True) 590 if asyncio.iscoroutinefunction(input_callback_function): 591 data = await input_callback_function() 592 else: 593 loop = asyncio.get_running_loop() 594 data = await loop.run_in_executor(None, input_callback_function) 595 except KeyboardInterrupt: 596 break 597 # if not data.endswith('\n'): 598 # data += '\n' 599 600 stdin_file.write(data) 601 await asyncio.sleep(self.refresh_seconds) 602 603 async def combine_events(): 604 event_tasks = [ 605 asyncio.create_task(event.wait()) 606 for event in events.values() 607 if event is not None 608 ] 609 if not event_tasks: 610 return 611 612 try: 613 done, pending = await asyncio.wait( 614 event_tasks, 615 return_when=asyncio.FIRST_COMPLETED, 616 ) 617 for task in pending: 618 task.cancel() 619 except asyncio.exceptions.CancelledError: 620 pass 621 finally: 622 combined_event.set() 623 624 check_job_status_task = asyncio.create_task(check_job_status()) 625 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 626 combine_events_task = asyncio.create_task(combine_events()) 627 628 log = _log if _log is not None else self.daemon.rotating_log 629 lines_to_show = ( 630 self.daemon.properties.get( 631 'logs', {} 632 ).get( 633 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 634 ) 635 ) 636 637 async def emit_latest_lines(): 638 nonlocal emitted_text 639 nonlocal stop_event 640 lines = log.readlines() 641 for line in lines[(-1 * lines_to_show):]: 642 if stop_event is not None and stop_event.is_set(): 643 return 644 645 line_stripped_extra = strip_timestamp_from_line(line.strip()) 646 line_stripped = strip_timestamp_from_line(line) 647 648 if line_stripped_extra == STOP_TOKEN: 649 events['stop_token'].set() 650 return 651 652 if line_stripped_extra == CLEAR_TOKEN: 653 clear_screen(debug=debug) 654 continue 655 656 if line_stripped_extra == FLUSH_TOKEN.strip(): 657 line_stripped = '' 658 line = '' 659 660 if strip_timestamps: 661 line = line_stripped 662 663 try: 664 if asyncio.iscoroutinefunction(callback_function): 665 await callback_function(line) 666 else: 667 callback_function(line) 668 emitted_text = True 669 except StopMonitoringLogs: 670 events['stop_exception'].set() 671 return 672 except Exception: 673 warn(f"Error in logs callback:\n{traceback.format_exc()}") 674 675 await emit_latest_lines() 676 677 tasks = ( 678 [check_job_status_task] 679 + ([check_blocking_on_input_task] if accept_input else []) 680 + [combine_events_task] 681 ) 682 try: 683 _ = asyncio.gather(*tasks, return_exceptions=True) 684 except asyncio.exceptions.CancelledError: 685 raise 686 except Exception: 687 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 688 689 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 690 dir_path_to_monitor = ( 691 _logs_path 692 or (log.file_path.parent if log else None) 693 or LOGS_RESOURCES_PATH 694 ) 695 async for changes in watchfiles.awatch( 696 dir_path_to_monitor, 697 stop_event=combined_event, 698 ): 699 for change in changes: 700 file_path_str = change[1] 701 file_path = pathlib.Path(file_path_str) 702 latest_subfile_path = log.get_latest_subfile_path() 703 if latest_subfile_path != file_path: 704 continue 705 706 await emit_latest_lines() 707 708 await emit_latest_lines() 709 710 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 711 """ 712 Return whether a job's daemon is blocking on stdin. 713 """ 714 if self.executor is not None: 715 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 716 717 return self.is_running() and self.daemon.blocking_stdin_file_path.exists() 718 719 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 720 """ 721 Return the kwargs to the blocking `prompt()`, if available. 722 """ 723 if self.executor is not None: 724 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 725 726 if not self.daemon.prompt_kwargs_file_path.exists(): 727 return {} 728 729 try: 730 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 731 prompt_kwargs = json.load(f) 732 733 return prompt_kwargs 734 735 except Exception: 736 import traceback 737 traceback.print_exc() 738 return {} 739 740 def write_stdin(self, data): 741 """ 742 Write to a job's daemon's `stdin`. 743 """ 744 self.daemon.stdin_file.write(data) 745 746 @property 747 def executor(self) -> Union[Executor, None]: 748 """ 749 If the job is remote, return the connector to the remote API instance. 750 """ 751 return ( 752 mrsm.get_connector(self.executor_keys) 753 if self.executor_keys != 'local' 754 else None 755 ) 756 757 @property 758 def status(self) -> str: 759 """ 760 Return the running status of the job's daemon. 761 """ 762 if '_status_hook' in self.__dict__: 763 return self._status_hook() 764 765 if self.executor is not None: 766 return self.executor.get_job_status(self.name) 767 768 return self.daemon.status 769 770 @property 771 def pid(self) -> Union[int, None]: 772 """ 773 Return the PID of the job's dameon. 774 """ 775 if self.executor is not None: 776 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 777 778 return self.daemon.pid 779 780 @property 781 def restart(self) -> bool: 782 """ 783 Return whether to restart a stopped job. 784 """ 785 if self.executor is not None: 786 return self.executor.get_job_metadata(self.name).get('restart', False) 787 788 return self.daemon.properties.get('restart', False) 789 790 @property 791 def result(self) -> SuccessTuple: 792 """ 793 Return the `SuccessTuple` when the job has terminated. 794 """ 795 if self.is_running(): 796 return True, f"{self} is running." 797 798 if '_result_hook' in self.__dict__: 799 return self._result_hook() 800 801 if self.executor is not None: 802 return ( 803 self.executor.get_job_metadata(self.name) 804 .get('result', (False, "No result available.")) 805 ) 806 807 _result = self.daemon.properties.get('result', None) 808 if _result is None: 809 from meerschaum.utils.daemon.Daemon import _results 810 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 811 812 return tuple(_result) 813 814 @property 815 def sysargs(self) -> List[str]: 816 """ 817 Return the sysargs to use for the Daemon. 818 """ 819 if self._sysargs: 820 return self._sysargs 821 822 if self.executor is not None: 823 return self.executor.get_job_metadata(self.name).get('sysargs', []) 824 825 target_args = self.daemon.target_args 826 if target_args is None: 827 return [] 828 self._sysargs = target_args[0] if len(target_args) > 0 else [] 829 return self._sysargs 830 831 def get_daemon_properties(self) -> Dict[str, Any]: 832 """ 833 Return the `properties` dictionary for the job's daemon. 834 """ 835 remote_properties = ( 836 {} 837 if self.executor is None 838 else self.executor.get_job_properties(self.name) 839 ) 840 return { 841 **remote_properties, 842 **self._properties_patch 843 } 844 845 @property 846 def daemon(self) -> 'Daemon': 847 """ 848 Return the daemon which this job manages. 849 """ 850 from meerschaum.utils.daemon import Daemon 851 if self._daemon is not None and self.executor is None and self._sysargs: 852 return self._daemon 853 854 self._daemon = Daemon( 855 target=entry, 856 target_args=[self._sysargs], 857 target_kw={}, 858 daemon_id=self.name, 859 label=shlex.join(self._sysargs), 860 properties=self.get_daemon_properties(), 861 ) 862 if '_rotating_log' in self.__dict__: 863 self._daemon._rotating_log = self._rotating_log 864 865 if '_stdin_file' in self.__dict__: 866 self._daemon._stdin_file = self._stdin_file 867 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 868 869 return self._daemon 870 871 @property 872 def began(self) -> Union[datetime, None]: 873 """ 874 The datetime when the job began running. 875 """ 876 if self.executor is not None: 877 began_str = self.executor.get_job_began(self.name) 878 if began_str is None: 879 return None 880 return ( 881 datetime.fromisoformat(began_str) 882 .astimezone(timezone.utc) 883 .replace(tzinfo=None) 884 ) 885 886 began_str = self.daemon.properties.get('process', {}).get('began', None) 887 if began_str is None: 888 return None 889 890 return datetime.fromisoformat(began_str) 891 892 @property 893 def ended(self) -> Union[datetime, None]: 894 """ 895 The datetime when the job stopped running. 896 """ 897 if self.executor is not None: 898 ended_str = self.executor.get_job_ended(self.name) 899 if ended_str is None: 900 return None 901 return ( 902 datetime.fromisoformat(ended_str) 903 .astimezone(timezone.utc) 904 .replace(tzinfo=None) 905 ) 906 907 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 908 if ended_str is None: 909 return None 910 911 return datetime.fromisoformat(ended_str) 912 913 @property 914 def paused(self) -> Union[datetime, None]: 915 """ 916 The datetime when the job was suspended while running. 917 """ 918 if self.executor is not None: 919 paused_str = self.executor.get_job_paused(self.name) 920 if paused_str is None: 921 return None 922 return ( 923 datetime.fromisoformat(paused_str) 924 .astimezone(timezone.utc) 925 .replace(tzinfo=None) 926 ) 927 928 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 929 if paused_str is None: 930 return None 931 932 return datetime.fromisoformat(paused_str) 933 934 @property 935 def stop_time(self) -> Union[datetime, None]: 936 """ 937 Return the timestamp when the job was manually stopped. 938 """ 939 if self.executor is not None: 940 return self.executor.get_job_stop_time(self.name) 941 942 if not self.daemon.stop_path.exists(): 943 return None 944 945 stop_data = self.daemon._read_stop_file() 946 if not stop_data: 947 return None 948 949 stop_time_str = stop_data.get('stop_time', None) 950 if not stop_time_str: 951 warn(f"Could not read stop time for {self}.") 952 return None 953 954 return datetime.fromisoformat(stop_time_str) 955 956 @property 957 def hidden(self) -> bool: 958 """ 959 Return a bool indicating whether this job should be displayed. 960 """ 961 return ( 962 self.name.startswith('_') 963 or self.name.startswith('.') 964 or self._is_externally_managed 965 ) 966 967 def check_restart(self) -> SuccessTuple: 968 """ 969 If `restart` is `True` and the daemon is not running, 970 restart the job. 971 Do not restart if the job was manually stopped. 972 """ 973 if self.is_running(): 974 return True, f"{self} is running." 975 976 if not self.restart: 977 return True, f"{self} does not need to be restarted." 978 979 if self.stop_time is not None: 980 return True, f"{self} was manually stopped." 981 982 return self.start() 983 984 @property 985 def label(self) -> str: 986 """ 987 Return the job's Daemon label (joined sysargs). 988 """ 989 from meerschaum._internal.arguments import compress_pipeline_sysargs 990 sysargs = compress_pipeline_sysargs(self.sysargs) 991 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip() 992 993 @property 994 def _externally_managed_file(self) -> pathlib.Path: 995 """ 996 Return the path to the externally managed file. 997 """ 998 return self.daemon.path / '.externally-managed' 999 1000 def _set_externally_managed(self): 1001 """ 1002 Set this job as externally managed. 1003 """ 1004 self._externally_managed = True 1005 try: 1006 self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True) 1007 self._externally_managed_file.touch() 1008 except Exception as e: 1009 warn(e) 1010 1011 @property 1012 def _is_externally_managed(self) -> bool: 1013 """ 1014 Return whether this job is externally managed. 1015 """ 1016 return self.executor_keys in (None, 'local') and ( 1017 self._externally_managed or self._externally_managed_file.exists() 1018 ) 1019 1020 @property 1021 def env(self) -> Dict[str, str]: 1022 """ 1023 Return the environment variables to set for the job's process. 1024 """ 1025 if '_env' in self.__dict__: 1026 return self.__dict__['_env'] 1027 1028 _env = self.daemon.properties.get('env', {}) 1029 default_env = { 1030 'PYTHONUNBUFFERED': '1', 1031 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1032 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1033 STATIC_CONFIG['environment']['noninteractive']: 'true', 1034 } 1035 self._env = {**default_env, **_env} 1036 return self._env 1037 1038 @property 1039 def delete_after_completion(self) -> bool: 1040 """ 1041 Return whether this job is configured to delete itself after completion. 1042 """ 1043 if '_delete_after_completion' in self.__dict__: 1044 return self.__dict__.get('_delete_after_completion', False) 1045 1046 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1047 return self._delete_after_completion 1048 1049 def __str__(self) -> str: 1050 sysargs = self.sysargs 1051 sysargs_str = shlex.join(sysargs) if sysargs else '' 1052 job_str = f'Job("{self.name}"' 1053 if sysargs_str: 1054 job_str += f', "{sysargs_str}"' 1055 1056 job_str += ')' 1057 return job_str 1058 1059 def __repr__(self) -> str: 1060 return str(self) 1061 1062 def __hash__(self) -> int: 1063 return hash(self.name)
Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.
75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break
Create a new job to manage a meerschaum.utils.daemon.Daemon.
Parameters
- name (str): The name of the job to be created. This will also be used as the Daemon ID.
- sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
- env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
- executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
- delete_after_completion (bool, default False):
If
True, delete this job when it has finished executing. - refresh_seconds (Union[int, float, None], default None):
The number of seconds to sleep between refreshes.
Defaults to the configured value
system.cli.refresh_seconds. - _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 from meerschaum.config.paths import DAEMON_RESOURCES_PATH 217 218 psutil = mrsm.attempt_import('psutil') 219 try: 220 process = psutil.Process(pid) 221 except psutil.NoSuchProcess as e: 222 warn(f"Process with PID {pid} does not exist.", stack=False) 223 raise e 224 225 command_args = process.cmdline() 226 is_daemon = command_args[1] == '-c' 227 228 if is_daemon: 229 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 230 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 231 if root_dir is None: 232 from meerschaum.config.paths import ROOT_DIR_PATH 233 root_dir = ROOT_DIR_PATH 234 else: 235 root_dir = pathlib.Path(root_dir) 236 jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name 237 daemon_dir = jobs_dir / daemon_id 238 pid_file = daemon_dir / 'process.pid' 239 240 if pid_file.exists(): 241 with open(pid_file, 'r', encoding='utf-8') as f: 242 daemon_pid = int(f.read()) 243 244 if pid != daemon_pid: 245 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 246 else: 247 raise EnvironmentError(f"Is job '{daemon_id}' running?") 248 249 return Job(daemon_id, executor_keys=executor_keys) 250 251 from meerschaum._internal.arguments._parse_arguments import parse_arguments 252 from meerschaum.utils.daemon import get_new_daemon_name 253 254 mrsm_ix = 0 255 for i, arg in enumerate(command_args): 256 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 257 mrsm_ix = i 258 break 259 260 sysargs = command_args[mrsm_ix+1:] 261 kwargs = parse_arguments(sysargs) 262 name = kwargs.get('name', get_new_daemon_name()) 263 return Job(name, sysargs, executor_keys=executor_keys)
Build a Job from the PID of a running Meerschaum process.
Parameters
- pid (int): The PID of the process.
- executor_keys (Optional[str], default None): The executor keys to assign to the job.
265 def start(self, debug: bool = False) -> SuccessTuple: 266 """ 267 Start the job's daemon. 268 """ 269 if self.executor is not None: 270 if not self.exists(debug=debug): 271 return self.executor.create_job( 272 self.name, 273 self.sysargs, 274 properties=self.daemon.properties, 275 debug=debug, 276 ) 277 return self.executor.start_job(self.name, debug=debug) 278 279 if self.is_running(): 280 return True, f"{self} is already running." 281 282 success, msg = self.daemon.run( 283 keep_daemon_output=(not self.delete_after_completion), 284 allow_dirty_run=True, 285 ) 286 if not success: 287 return success, msg 288 289 return success, f"Started {self}."
Start the job's daemon.
291 def stop( 292 self, 293 timeout_seconds: Union[int, float, None] = None, 294 debug: bool = False, 295 ) -> SuccessTuple: 296 """ 297 Stop the job's daemon. 298 """ 299 if self.executor is not None: 300 return self.executor.stop_job(self.name, debug=debug) 301 302 if self.daemon.status == 'stopped': 303 if not self.restart: 304 return True, f"{self} is not running." 305 elif self.stop_time is not None: 306 return True, f"{self} will not restart until manually started." 307 308 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 309 if quit_success: 310 return quit_success, f"Stopped {self}." 311 312 warn( 313 f"Failed to gracefully quit {self}.", 314 stack=False, 315 ) 316 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 317 if not kill_success: 318 return kill_success, kill_msg 319 320 return kill_success, f"Killed {self}."
Stop the job's daemon.
322 def pause( 323 self, 324 timeout_seconds: Union[int, float, None] = None, 325 debug: bool = False, 326 ) -> SuccessTuple: 327 """ 328 Pause the job's daemon. 329 """ 330 if self.executor is not None: 331 return self.executor.pause_job(self.name, debug=debug) 332 333 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 334 if not pause_success: 335 return pause_success, pause_msg 336 337 return pause_success, f"Paused {self}."
Pause the job's daemon.
339 def delete(self, debug: bool = False) -> SuccessTuple: 340 """ 341 Delete the job and its daemon. 342 """ 343 if self.executor is not None: 344 return self.executor.delete_job(self.name, debug=debug) 345 346 if self.is_running(): 347 stop_success, stop_msg = self.stop() 348 if not stop_success: 349 return stop_success, stop_msg 350 351 cleanup_success, cleanup_msg = self.daemon.cleanup() 352 if not cleanup_success: 353 return cleanup_success, cleanup_msg 354 355 _ = self.daemon._properties.pop('result', None) 356 return cleanup_success, f"Deleted {self}."
Delete the job and its daemon.
358 def is_running(self) -> bool: 359 """ 360 Determine whether the job's daemon is running. 361 """ 362 return self.status == 'running'
Determine whether the job's daemon is running.
364 def exists(self, debug: bool = False) -> bool: 365 """ 366 Determine whether the job exists. 367 """ 368 if self.executor is not None: 369 return self.executor.get_job_exists(self.name, debug=debug) 370 371 return self.daemon.path.exists()
Determine whether the job exists.
373 def get_logs(self) -> Union[str, None]: 374 """ 375 Return the output text of the job's daemon. 376 """ 377 if self.executor is not None: 378 return self.executor.get_logs(self.name) 379 380 return self.daemon.log_text
Return the output text of the job's daemon.
382 def monitor_logs( 383 self, 384 callback_function: Callable[[str], None] = _default_stdout_callback, 385 input_callback_function: Optional[Callable[[], str]] = None, 386 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 387 stop_event: Optional[asyncio.Event] = None, 388 stop_on_exit: bool = False, 389 strip_timestamps: bool = False, 390 accept_input: bool = True, 391 debug: bool = False, 392 _logs_path: Optional[pathlib.Path] = None, 393 _log=None, 394 _stdin_file=None, 395 _wait_if_stopped: bool = True, 396 ): 397 """ 398 Monitor the job's log files and execute a callback on new lines. 399 400 Parameters 401 ---------- 402 callback_function: Callable[[str], None], default partial(print, end='') 403 The callback to execute as new data comes in. 404 Defaults to printing the output directly to `stdout`. 405 406 input_callback_function: Optional[Callable[[], str]], default None 407 If provided, execute this callback when the daemon is blocking on stdin. 408 Defaults to `sys.stdin.readline()`. 409 410 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 411 If provided, execute this callback when the daemon stops. 412 The job's SuccessTuple will be passed to the callback. 413 414 stop_event: Optional[asyncio.Event], default None 415 If provided, stop monitoring when this event is set. 416 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 417 from within `callback_function` to stop monitoring. 418 419 stop_on_exit: bool, default False 420 If `True`, stop monitoring when the job stops. 421 422 strip_timestamps: bool, default False 423 If `True`, remove leading timestamps from lines. 424 425 accept_input: bool, default True 426 If `True`, accept input when the daemon blocks on stdin. 427 """ 428 if self.executor is not None: 429 self.executor.monitor_logs( 430 self.name, 431 callback_function, 432 input_callback_function=input_callback_function, 433 stop_callback_function=stop_callback_function, 434 stop_on_exit=stop_on_exit, 435 accept_input=accept_input, 436 strip_timestamps=strip_timestamps, 437 debug=debug, 438 ) 439 return 440 441 monitor_logs_coroutine = self.monitor_logs_async( 442 callback_function=callback_function, 443 input_callback_function=input_callback_function, 444 stop_callback_function=stop_callback_function, 445 stop_event=stop_event, 446 stop_on_exit=stop_on_exit, 447 strip_timestamps=strip_timestamps, 448 accept_input=accept_input, 449 debug=debug, 450 _logs_path=_logs_path, 451 _log=_log, 452 _stdin_file=_stdin_file, 453 _wait_if_stopped=_wait_if_stopped, 454 ) 455 return asyncio.run(monitor_logs_coroutine)
Monitor the job's log files and execute a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default partial(print, end='')):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
457 async def monitor_logs_async( 458 self, 459 callback_function: Callable[[str], None] = _default_stdout_callback, 460 input_callback_function: Optional[Callable[[], str]] = None, 461 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 462 stop_event: Optional[asyncio.Event] = None, 463 stop_on_exit: bool = False, 464 strip_timestamps: bool = False, 465 accept_input: bool = True, 466 debug: bool = False, 467 _logs_path: Optional[pathlib.Path] = None, 468 _log=None, 469 _stdin_file=None, 470 _wait_if_stopped: bool = True, 471 ): 472 """ 473 Monitor the job's log files and await a callback on new lines. 474 475 Parameters 476 ---------- 477 callback_function: Callable[[str], None], default _default_stdout_callback 478 The callback to execute as new data comes in. 479 Defaults to printing the output directly to `stdout`. 480 481 input_callback_function: Optional[Callable[[], str]], default None 482 If provided, execute this callback when the daemon is blocking on stdin. 483 Defaults to `sys.stdin.readline()`. 484 485 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 486 If provided, execute this callback when the daemon stops. 487 The job's SuccessTuple will be passed to the callback. 488 489 stop_event: Optional[asyncio.Event], default None 490 If provided, stop monitoring when this event is set. 491 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 492 from within `callback_function` to stop monitoring. 493 494 stop_on_exit: bool, default False 495 If `True`, stop monitoring when the job stops. 496 497 strip_timestamps: bool, default False 498 If `True`, remove leading timestamps from lines. 499 500 accept_input: bool, default True 501 If `True`, accept input when the daemon blocks on stdin. 502 """ 503 from meerschaum.utils.prompt import prompt 504 505 def default_input_callback_function(): 506 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 507 if prompt_kwargs: 508 answer = prompt(**prompt_kwargs) 509 return answer + '\n' 510 return sys.stdin.readline() 511 512 if input_callback_function is None: 513 input_callback_function = default_input_callback_function 514 515 if self.executor is not None: 516 await self.executor.monitor_logs_async( 517 self.name, 518 callback_function, 519 input_callback_function=input_callback_function, 520 stop_callback_function=stop_callback_function, 521 stop_on_exit=stop_on_exit, 522 strip_timestamps=strip_timestamps, 523 accept_input=accept_input, 524 debug=debug, 525 ) 526 return 527 528 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 529 530 events = { 531 'user': stop_event, 532 'stopped': asyncio.Event(), 533 'stop_token': asyncio.Event(), 534 'stop_exception': asyncio.Event(), 535 'stopped_timeout': asyncio.Event(), 536 } 537 combined_event = asyncio.Event() 538 emitted_text = False 539 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 540 541 async def check_job_status(): 542 if not stop_on_exit: 543 return 544 545 nonlocal emitted_text 546 547 sleep_time = 0.1 548 while sleep_time < 0.2: 549 if self.status == 'stopped': 550 if not emitted_text and _wait_if_stopped: 551 await asyncio.sleep(sleep_time) 552 sleep_time = round(sleep_time * 1.1, 3) 553 continue 554 555 if stop_callback_function is not None: 556 try: 557 if asyncio.iscoroutinefunction(stop_callback_function): 558 await stop_callback_function(self.result) 559 else: 560 stop_callback_function(self.result) 561 except asyncio.exceptions.CancelledError: 562 break 563 except Exception: 564 warn(traceback.format_exc()) 565 566 if stop_on_exit: 567 events['stopped'].set() 568 569 break 570 await asyncio.sleep(0.1) 571 572 events['stopped_timeout'].set() 573 574 async def check_blocking_on_input(): 575 while True: 576 if not emitted_text or not self.is_blocking_on_stdin(): 577 try: 578 await asyncio.sleep(self.refresh_seconds) 579 except asyncio.exceptions.CancelledError: 580 break 581 continue 582 583 if not self.is_running(): 584 break 585 586 await emit_latest_lines() 587 588 try: 589 print('', end='', flush=True) 590 if asyncio.iscoroutinefunction(input_callback_function): 591 data = await input_callback_function() 592 else: 593 loop = asyncio.get_running_loop() 594 data = await loop.run_in_executor(None, input_callback_function) 595 except KeyboardInterrupt: 596 break 597 # if not data.endswith('\n'): 598 # data += '\n' 599 600 stdin_file.write(data) 601 await asyncio.sleep(self.refresh_seconds) 602 603 async def combine_events(): 604 event_tasks = [ 605 asyncio.create_task(event.wait()) 606 for event in events.values() 607 if event is not None 608 ] 609 if not event_tasks: 610 return 611 612 try: 613 done, pending = await asyncio.wait( 614 event_tasks, 615 return_when=asyncio.FIRST_COMPLETED, 616 ) 617 for task in pending: 618 task.cancel() 619 except asyncio.exceptions.CancelledError: 620 pass 621 finally: 622 combined_event.set() 623 624 check_job_status_task = asyncio.create_task(check_job_status()) 625 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 626 combine_events_task = asyncio.create_task(combine_events()) 627 628 log = _log if _log is not None else self.daemon.rotating_log 629 lines_to_show = ( 630 self.daemon.properties.get( 631 'logs', {} 632 ).get( 633 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 634 ) 635 ) 636 637 async def emit_latest_lines(): 638 nonlocal emitted_text 639 nonlocal stop_event 640 lines = log.readlines() 641 for line in lines[(-1 * lines_to_show):]: 642 if stop_event is not None and stop_event.is_set(): 643 return 644 645 line_stripped_extra = strip_timestamp_from_line(line.strip()) 646 line_stripped = strip_timestamp_from_line(line) 647 648 if line_stripped_extra == STOP_TOKEN: 649 events['stop_token'].set() 650 return 651 652 if line_stripped_extra == CLEAR_TOKEN: 653 clear_screen(debug=debug) 654 continue 655 656 if line_stripped_extra == FLUSH_TOKEN.strip(): 657 line_stripped = '' 658 line = '' 659 660 if strip_timestamps: 661 line = line_stripped 662 663 try: 664 if asyncio.iscoroutinefunction(callback_function): 665 await callback_function(line) 666 else: 667 callback_function(line) 668 emitted_text = True 669 except StopMonitoringLogs: 670 events['stop_exception'].set() 671 return 672 except Exception: 673 warn(f"Error in logs callback:\n{traceback.format_exc()}") 674 675 await emit_latest_lines() 676 677 tasks = ( 678 [check_job_status_task] 679 + ([check_blocking_on_input_task] if accept_input else []) 680 + [combine_events_task] 681 ) 682 try: 683 _ = asyncio.gather(*tasks, return_exceptions=True) 684 except asyncio.exceptions.CancelledError: 685 raise 686 except Exception: 687 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 688 689 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 690 dir_path_to_monitor = ( 691 _logs_path 692 or (log.file_path.parent if log else None) 693 or LOGS_RESOURCES_PATH 694 ) 695 async for changes in watchfiles.awatch( 696 dir_path_to_monitor, 697 stop_event=combined_event, 698 ): 699 for change in changes: 700 file_path_str = change[1] 701 file_path = pathlib.Path(file_path_str) 702 latest_subfile_path = log.get_latest_subfile_path() 703 if latest_subfile_path != file_path: 704 continue 705 706 await emit_latest_lines() 707 708 await emit_latest_lines()
Monitor the job's log files and await a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default _default_stdout_callback):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
710 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 711 """ 712 Return whether a job's daemon is blocking on stdin. 713 """ 714 if self.executor is not None: 715 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 716 717 return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
Return whether a job's daemon is blocking on stdin.
719 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 720 """ 721 Return the kwargs to the blocking `prompt()`, if available. 722 """ 723 if self.executor is not None: 724 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 725 726 if not self.daemon.prompt_kwargs_file_path.exists(): 727 return {} 728 729 try: 730 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 731 prompt_kwargs = json.load(f) 732 733 return prompt_kwargs 734 735 except Exception: 736 import traceback 737 traceback.print_exc() 738 return {}
Return the kwargs to the blocking prompt(), if available.
740 def write_stdin(self, data): 741 """ 742 Write to a job's daemon's `stdin`. 743 """ 744 self.daemon.stdin_file.write(data)
Write to a job's daemon's stdin.
746 @property 747 def executor(self) -> Union[Executor, None]: 748 """ 749 If the job is remote, return the connector to the remote API instance. 750 """ 751 return ( 752 mrsm.get_connector(self.executor_keys) 753 if self.executor_keys != 'local' 754 else None 755 )
If the job is remote, return the connector to the remote API instance.
757 @property 758 def status(self) -> str: 759 """ 760 Return the running status of the job's daemon. 761 """ 762 if '_status_hook' in self.__dict__: 763 return self._status_hook() 764 765 if self.executor is not None: 766 return self.executor.get_job_status(self.name) 767 768 return self.daemon.status
Return the running status of the job's daemon.
770 @property 771 def pid(self) -> Union[int, None]: 772 """ 773 Return the PID of the job's dameon. 774 """ 775 if self.executor is not None: 776 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 777 778 return self.daemon.pid
Return the PID of the job's dameon.
780 @property 781 def restart(self) -> bool: 782 """ 783 Return whether to restart a stopped job. 784 """ 785 if self.executor is not None: 786 return self.executor.get_job_metadata(self.name).get('restart', False) 787 788 return self.daemon.properties.get('restart', False)
Return whether to restart a stopped job.
790 @property 791 def result(self) -> SuccessTuple: 792 """ 793 Return the `SuccessTuple` when the job has terminated. 794 """ 795 if self.is_running(): 796 return True, f"{self} is running." 797 798 if '_result_hook' in self.__dict__: 799 return self._result_hook() 800 801 if self.executor is not None: 802 return ( 803 self.executor.get_job_metadata(self.name) 804 .get('result', (False, "No result available.")) 805 ) 806 807 _result = self.daemon.properties.get('result', None) 808 if _result is None: 809 from meerschaum.utils.daemon.Daemon import _results 810 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 811 812 return tuple(_result)
Return the SuccessTuple when the job has terminated.
814 @property 815 def sysargs(self) -> List[str]: 816 """ 817 Return the sysargs to use for the Daemon. 818 """ 819 if self._sysargs: 820 return self._sysargs 821 822 if self.executor is not None: 823 return self.executor.get_job_metadata(self.name).get('sysargs', []) 824 825 target_args = self.daemon.target_args 826 if target_args is None: 827 return [] 828 self._sysargs = target_args[0] if len(target_args) > 0 else [] 829 return self._sysargs
Return the sysargs to use for the Daemon.
831 def get_daemon_properties(self) -> Dict[str, Any]: 832 """ 833 Return the `properties` dictionary for the job's daemon. 834 """ 835 remote_properties = ( 836 {} 837 if self.executor is None 838 else self.executor.get_job_properties(self.name) 839 ) 840 return { 841 **remote_properties, 842 **self._properties_patch 843 }
Return the properties dictionary for the job's daemon.
845 @property 846 def daemon(self) -> 'Daemon': 847 """ 848 Return the daemon which this job manages. 849 """ 850 from meerschaum.utils.daemon import Daemon 851 if self._daemon is not None and self.executor is None and self._sysargs: 852 return self._daemon 853 854 self._daemon = Daemon( 855 target=entry, 856 target_args=[self._sysargs], 857 target_kw={}, 858 daemon_id=self.name, 859 label=shlex.join(self._sysargs), 860 properties=self.get_daemon_properties(), 861 ) 862 if '_rotating_log' in self.__dict__: 863 self._daemon._rotating_log = self._rotating_log 864 865 if '_stdin_file' in self.__dict__: 866 self._daemon._stdin_file = self._stdin_file 867 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 868 869 return self._daemon
Return the daemon which this job manages.
871 @property 872 def began(self) -> Union[datetime, None]: 873 """ 874 The datetime when the job began running. 875 """ 876 if self.executor is not None: 877 began_str = self.executor.get_job_began(self.name) 878 if began_str is None: 879 return None 880 return ( 881 datetime.fromisoformat(began_str) 882 .astimezone(timezone.utc) 883 .replace(tzinfo=None) 884 ) 885 886 began_str = self.daemon.properties.get('process', {}).get('began', None) 887 if began_str is None: 888 return None 889 890 return datetime.fromisoformat(began_str)
The datetime when the job began running.
892 @property 893 def ended(self) -> Union[datetime, None]: 894 """ 895 The datetime when the job stopped running. 896 """ 897 if self.executor is not None: 898 ended_str = self.executor.get_job_ended(self.name) 899 if ended_str is None: 900 return None 901 return ( 902 datetime.fromisoformat(ended_str) 903 .astimezone(timezone.utc) 904 .replace(tzinfo=None) 905 ) 906 907 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 908 if ended_str is None: 909 return None 910 911 return datetime.fromisoformat(ended_str)
The datetime when the job stopped running.
913 @property 914 def paused(self) -> Union[datetime, None]: 915 """ 916 The datetime when the job was suspended while running. 917 """ 918 if self.executor is not None: 919 paused_str = self.executor.get_job_paused(self.name) 920 if paused_str is None: 921 return None 922 return ( 923 datetime.fromisoformat(paused_str) 924 .astimezone(timezone.utc) 925 .replace(tzinfo=None) 926 ) 927 928 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 929 if paused_str is None: 930 return None 931 932 return datetime.fromisoformat(paused_str)
The datetime when the job was suspended while running.
934 @property 935 def stop_time(self) -> Union[datetime, None]: 936 """ 937 Return the timestamp when the job was manually stopped. 938 """ 939 if self.executor is not None: 940 return self.executor.get_job_stop_time(self.name) 941 942 if not self.daemon.stop_path.exists(): 943 return None 944 945 stop_data = self.daemon._read_stop_file() 946 if not stop_data: 947 return None 948 949 stop_time_str = stop_data.get('stop_time', None) 950 if not stop_time_str: 951 warn(f"Could not read stop time for {self}.") 952 return None 953 954 return datetime.fromisoformat(stop_time_str)
Return the timestamp when the job was manually stopped.
967 def check_restart(self) -> SuccessTuple: 968 """ 969 If `restart` is `True` and the daemon is not running, 970 restart the job. 971 Do not restart if the job was manually stopped. 972 """ 973 if self.is_running(): 974 return True, f"{self} is running." 975 976 if not self.restart: 977 return True, f"{self} does not need to be restarted." 978 979 if self.stop_time is not None: 980 return True, f"{self} was manually stopped." 981 982 return self.start()
If restart is True and the daemon is not running,
restart the job.
Do not restart if the job was manually stopped.
984 @property 985 def label(self) -> str: 986 """ 987 Return the job's Daemon label (joined sysargs). 988 """ 989 from meerschaum._internal.arguments import compress_pipeline_sysargs 990 sysargs = compress_pipeline_sysargs(self.sysargs) 991 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
Return the job's Daemon label (joined sysargs).
1020 @property 1021 def env(self) -> Dict[str, str]: 1022 """ 1023 Return the environment variables to set for the job's process. 1024 """ 1025 if '_env' in self.__dict__: 1026 return self.__dict__['_env'] 1027 1028 _env = self.daemon.properties.get('env', {}) 1029 default_env = { 1030 'PYTHONUNBUFFERED': '1', 1031 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1032 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1033 STATIC_CONFIG['environment']['noninteractive']: 'true', 1034 } 1035 self._env = {**default_env, **_env} 1036 return self._env
Return the environment variables to set for the job's process.
1038 @property 1039 def delete_after_completion(self) -> bool: 1040 """ 1041 Return whether this job is configured to delete itself after completion. 1042 """ 1043 if '_delete_after_completion' in self.__dict__: 1044 return self.__dict__.get('_delete_after_completion', False) 1045 1046 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1047 return self._delete_after_completion
Return whether this job is configured to delete itself after completion.
10def pprint( 11 *args, 12 detect_password: bool = True, 13 nopretty: bool = False, 14 **kw 15) -> None: 16 """Pretty print an object according to the configured ANSI and UNICODE settings. 17 If detect_password is True (default), search and replace passwords with '*' characters. 18 Does not mutate objects. 19 """ 20 import copy 21 import json 22 from meerschaum.utils.packages import attempt_import, import_rich 23 from meerschaum.utils.formatting import ANSI, get_console, print_tuple 24 from meerschaum.utils.warnings import error 25 from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords 26 from collections import OrderedDict 27 28 if ( 29 len(args) == 1 30 and 31 isinstance(args[0], tuple) 32 and 33 len(args[0]) == 2 34 and 35 isinstance(args[0][0], bool) 36 and 37 isinstance(args[0][1], str) 38 ): 39 return print_tuple(args[0], **filter_keywords(print_tuple, **kw)) 40 41 modify = True 42 rich_pprint = None 43 if ANSI and not nopretty: 44 rich = import_rich() 45 if rich is not None: 46 rich_pretty = attempt_import('rich.pretty') 47 if rich_pretty is not None: 48 def _rich_pprint(*args, **kw): 49 _console = get_console() 50 _kw = filter_keywords(_console.print, **kw) 51 _console.print(*args, **_kw) 52 rich_pprint = _rich_pprint 53 elif not nopretty: 54 pprintpp = attempt_import('pprintpp', warn=False) 55 try: 56 _pprint = pprintpp.pprint 57 except Exception : 58 import pprint as _pprint_module 59 _pprint = _pprint_module.pprint 60 61 func = ( 62 _pprint if rich_pprint is None else rich_pprint 63 ) if not nopretty else print 64 65 try: 66 args_copy = copy.deepcopy(args) 67 except Exception: 68 args_copy = args 69 modify = False 70 71 _args = [] 72 for a in args: 73 c = a 74 ### convert OrderedDict into dict 75 if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict): 76 c = dict_from_od(copy.deepcopy(c)) 77 _args.append(c) 78 args = _args 79 80 _args = list(args) 81 if detect_password and modify: 82 _args = [] 83 for a in args: 84 c = a 85 if isinstance(c, dict): 86 c = replace_password(copy.deepcopy(c)) 87 if nopretty: 88 try: 89 c = json.dumps(c) 90 is_json = True 91 except Exception: 92 is_json = False 93 if not is_json: 94 try: 95 c = str(c) 96 except Exception: 97 pass 98 _args.append(c) 99 100 ### filter out unsupported keywords 101 func_kw = filter_keywords(func, **kw) if not nopretty else {} 102 error_msg = None 103 try: 104 func(*_args, **func_kw) 105 except Exception as e: 106 error_msg = e 107 if error_msg is not None: 108 error(error_msg)
Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.
1250def attempt_import( 1251 *names: str, 1252 lazy: bool = True, 1253 warn: bool = True, 1254 install: bool = True, 1255 venv: Optional[str] = 'mrsm', 1256 precheck: bool = True, 1257 split: bool = True, 1258 check_update: bool = False, 1259 check_pypi: bool = False, 1260 check_is_installed: bool = True, 1261 allow_outside_venv: bool = True, 1262 color: bool = True, 1263 debug: bool = False 1264) -> Any: 1265 """ 1266 Raise a warning if packages are not installed; otherwise import and return modules. 1267 If `lazy` is `True`, return lazy-imported modules. 1268 1269 Returns tuple of modules if multiple names are provided, else returns one module. 1270 1271 Parameters 1272 ---------- 1273 names: List[str] 1274 The packages to be imported. 1275 1276 lazy: bool, default True 1277 If `True`, lazily load packages. 1278 1279 warn: bool, default True 1280 If `True`, raise a warning if a package cannot be imported. 1281 1282 install: bool, default True 1283 If `True`, attempt to install a missing package into the designated virtual environment. 1284 If `check_update` is True, install updates if available. 1285 1286 venv: Optional[str], default 'mrsm' 1287 The virtual environment in which to search for packages and to install packages into. 1288 1289 precheck: bool, default True 1290 If `True`, attempt to find module before importing (necessary for checking if modules exist 1291 and retaining lazy imports), otherwise assume lazy is `False`. 1292 1293 split: bool, default True 1294 If `True`, split packages' names on `'.'`. 1295 1296 check_update: bool, default False 1297 If `True` and `install` is `True`, install updates if the required minimum version 1298 does not match. 1299 1300 check_pypi: bool, default False 1301 If `True` and `check_update` is `True`, check PyPI when determining whether 1302 an update is required. 1303 1304 check_is_installed: bool, default True 1305 If `True`, check if the package is contained in the virtual environment. 1306 1307 allow_outside_venv: bool, default True 1308 If `True`, search outside of the specified virtual environment 1309 if the package cannot be found. 1310 Setting to `False` will reinstall the package into a virtual environment, even if it 1311 is installed outside. 1312 1313 color: bool, default True 1314 If `False`, do not print ANSI colors. 1315 1316 Returns 1317 ------- 1318 The specified modules. If they're not available and `install` is `True`, it will first 1319 download them into a virtual environment and return the modules. 1320 1321 Examples 1322 -------- 1323 >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy') 1324 >>> pandas = attempt_import('pandas') 1325 1326 """ 1327 1328 import importlib.util 1329 1330 ### to prevent recursion, check if parent Meerschaum package is being imported 1331 if names == ('meerschaum',): 1332 return _import_module('meerschaum') 1333 1334 if venv == 'mrsm' and _import_hook_venv is not None: 1335 if debug: 1336 print(f"Import hook for virtual environment '{_import_hook_venv}' is active.") 1337 venv = _import_hook_venv 1338 1339 _warnings = _import_module('meerschaum.utils.warnings') 1340 warn_function = _warnings.warn 1341 1342 def do_import(_name: str, **kw) -> Union['ModuleType', None]: 1343 with Venv(venv=venv, debug=debug): 1344 ### determine the import method (lazy vs normal) 1345 from meerschaum.utils.misc import filter_keywords 1346 import_method = ( 1347 _import_module if not lazy 1348 else lazy_import 1349 ) 1350 try: 1351 mod = import_method(_name, **(filter_keywords(import_method, **kw))) 1352 except Exception as e: 1353 if warn: 1354 import traceback 1355 traceback.print_exception(type(e), e, e.__traceback__) 1356 warn_function( 1357 f"Failed to import module '{_name}'.\nException:\n{e}", 1358 ImportWarning, 1359 stacklevel = (5 if lazy else 4), 1360 color = False, 1361 ) 1362 mod = None 1363 return mod 1364 1365 modules = [] 1366 for name in names: 1367 ### Check if package is a declared dependency. 1368 root_name = name.split('.')[0] if split else name 1369 install_name = _import_to_install_name(root_name) 1370 1371 if install_name is None: 1372 install_name = root_name 1373 if warn and root_name != 'plugins': 1374 warn_function( 1375 f"Package '{root_name}' is not declared in meerschaum.utils.packages.", 1376 ImportWarning, 1377 stacklevel = 3, 1378 color = False 1379 ) 1380 1381 ### Determine if the package exists. 1382 if precheck is False: 1383 found_module = ( 1384 do_import( 1385 name, debug=debug, warn=False, venv=venv, color=color, 1386 check_update=False, check_pypi=False, split=split, 1387 ) is not None 1388 ) 1389 else: 1390 if check_is_installed: 1391 with _locks['_is_installed_first_check']: 1392 if not _is_installed_first_check.get(name, False): 1393 package_is_installed = is_installed( 1394 name, 1395 venv = venv, 1396 split = split, 1397 allow_outside_venv = allow_outside_venv, 1398 debug = debug, 1399 ) 1400 _is_installed_first_check[name] = package_is_installed 1401 else: 1402 package_is_installed = _is_installed_first_check[name] 1403 else: 1404 package_is_installed = _is_installed_first_check.get( 1405 name, 1406 venv_contains_package(name, venv=venv, split=split, debug=debug) 1407 ) 1408 found_module = package_is_installed 1409 1410 if not found_module: 1411 if install: 1412 if not pip_install( 1413 install_name, 1414 venv = venv, 1415 split = False, 1416 check_update = check_update, 1417 color = color, 1418 debug = debug 1419 ) and warn: 1420 warn_function( 1421 f"Failed to install '{install_name}'.", 1422 ImportWarning, 1423 stacklevel = 3, 1424 color = False, 1425 ) 1426 elif warn: 1427 ### Raise a warning if we can't find the package and install = False. 1428 warn_function( 1429 (f"\n\nMissing package '{name}' from virtual environment '{venv}'; " 1430 + "some features will not work correctly." 1431 + "\n\nSet install=True when calling attempt_import.\n"), 1432 ImportWarning, 1433 stacklevel = 3, 1434 color = False, 1435 ) 1436 1437 ### Do the import. Will be lazy if lazy=True. 1438 m = do_import( 1439 name, debug=debug, warn=warn, venv=venv, color=color, 1440 check_update=check_update, check_pypi=check_pypi, install=install, split=split, 1441 ) 1442 modules.append(m) 1443 1444 modules = tuple(modules) 1445 if len(modules) == 1: 1446 return modules[0] 1447 return modules
Raise a warning if packages are not installed; otherwise import and return modules.
If lazy is True, return lazy-imported modules.
Returns tuple of modules if multiple names are provided, else returns one module.
Parameters
- names (List[str]): The packages to be imported.
- lazy (bool, default True):
If
True, lazily load packages. - warn (bool, default True):
If
True, raise a warning if a package cannot be imported. - install (bool, default True):
If
True, attempt to install a missing package into the designated virtual environment. Ifcheck_updateis True, install updates if available. - venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
- precheck (bool, default True):
If
True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy isFalse. - split (bool, default True):
If
True, split packages' names on'.'. - check_update (bool, default False):
If
TrueandinstallisTrue, install updates if the required minimum version does not match. - check_pypi (bool, default False):
If
Trueandcheck_updateisTrue, check PyPI when determining whether an update is required. - check_is_installed (bool, default True):
If
True, check if the package is contained in the virtual environment. - allow_outside_venv (bool, default True):
If
True, search outside of the specified virtual environment if the package cannot be found. Setting toFalsewill reinstall the package into a virtual environment, even if it is installed outside. - color (bool, default True):
If
False, do not print ANSI colors.
Returns
- The specified modules. If they're not available and
installisTrue, it will first - download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
22class Connector(metaclass=abc.ABCMeta): 23 """ 24 The base connector class to hold connection attributes. 25 """ 26 27 IS_INSTANCE: bool = False 28 29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 ) 69 70 def _reset_attributes(self): 71 self.__dict__ = self._original_dict 72 73 def _set_attributes( 74 self, 75 *args, 76 inherit_default: bool = True, 77 **kw: Any 78 ): 79 from meerschaum._internal.static import STATIC_CONFIG 80 from meerschaum.utils.warnings import error 81 82 self._attributes = {} 83 84 default_label = STATIC_CONFIG['connectors']['default_label'] 85 86 ### NOTE: Support the legacy method of explicitly passing the type. 87 label = kw.get('label', None) 88 if label is None: 89 if len(args) == 2: 90 label = args[1] 91 elif len(args) == 0: 92 label = None 93 else: 94 label = args[0] 95 96 if label == 'default': 97 error( 98 f"Label cannot be 'default'. Did you mean '{default_label}'?", 99 InvalidAttributesError, 100 ) 101 self.__dict__['label'] = label 102 103 from meerschaum.config import get_config 104 conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors')) 105 connector_config = copy.deepcopy(get_config('system', 'connectors')) 106 107 ### inherit attributes from 'default' if exists 108 if inherit_default: 109 inherit_from = 'default' 110 if self.type in conn_configs and inherit_from in conn_configs[self.type]: 111 _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from]) 112 self._attributes.update(_inherit_dict) 113 114 ### load user config into self._attributes 115 if self.type in conn_configs and self.label in conn_configs[self.type]: 116 self._attributes.update(conn_configs[self.type][self.label] or {}) 117 118 ### load system config into self._sys_config 119 ### (deep copy so future Connectors don't inherit changes) 120 if self.type in connector_config: 121 self._sys_config = copy.deepcopy(connector_config[self.type]) 122 123 ### add additional arguments or override configuration 124 self._attributes.update(kw) 125 126 ### finally, update __dict__ with _attributes. 127 self.__dict__.update(self._attributes) 128 129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 ) 175 176 177 def __str__(self): 178 """ 179 When cast to a string, return type:label. 180 """ 181 return f"{self.type}:{self.label}" 182 183 def __repr__(self): 184 """ 185 Represent the connector as type:label. 186 """ 187 return str(self) 188 189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta 204 205 206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type 225 226 227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
The base connector class to hold connection attributes.
29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 )
129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 )
Ensure that the required attributes have been met.
The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.
Parameters
- required_attributes (Optional[List[str]], default None):
Attributes to be verified. If
None, default to['label']. - debug (bool, default False): Verbosity toggle.
Returns
- Don't return anything.
Raises
- An error if any of the required attributes are missing.
189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta
Return the keys needed to reconstruct this Connector.
206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type
Return the type for this connector.
227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
Return the label for this connector.
18class InstanceConnector(Connector): 19 """ 20 Instance connectors define the interface for managing pipes and provide methods 21 for management of users, plugins, tokens, and other metadata built atop pipes. 22 """ 23 24 IS_INSTANCE: bool = True 25 IS_THREAD_SAFE: bool = False 26 27 from ._users import ( 28 get_users_pipe, 29 register_user, 30 get_user_id, 31 get_username, 32 get_users, 33 edit_user, 34 delete_user, 35 get_user_password_hash, 36 get_user_type, 37 get_user_attributes, 38 ) 39 40 from ._plugins import ( 41 get_plugins_pipe, 42 register_plugin, 43 get_plugin_user_id, 44 delete_plugin, 45 get_plugin_id, 46 get_plugin_version, 47 get_plugins, 48 get_plugin_user_id, 49 get_plugin_username, 50 get_plugin_attributes, 51 ) 52 53 from ._tokens import ( 54 get_tokens_pipe, 55 register_token, 56 edit_token, 57 invalidate_token, 58 delete_token, 59 get_token, 60 get_tokens, 61 get_token_model, 62 get_token_secret_hash, 63 token_exists, 64 get_token_scopes, 65 ) 66 67 from ._pipes import ( 68 register_pipe, 69 get_pipe_attributes, 70 get_pipe_id, 71 edit_pipe, 72 delete_pipe, 73 fetch_pipes_keys, 74 pipe_exists, 75 drop_pipe, 76 drop_pipe_indices, 77 sync_pipe, 78 create_pipe_indices, 79 clear_pipe, 80 get_pipe_data, 81 get_sync_time, 82 get_pipe_columns_types, 83 get_pipe_columns_indices, 84 )
Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.
18def get_users_pipe(self) -> 'mrsm.Pipe': 19 """ 20 Return the pipe used for users registration. 21 """ 22 if '_users_pipe' in self.__dict__: 23 return self._users_pipe 24 25 cache_connector = self.__dict__.get('_cache_connector', None) 26 self._users_pipe = mrsm.Pipe( 27 'mrsm', 'users', 28 instance=self, 29 target='mrsm_users', 30 temporary=True, 31 cache=True, 32 cache_connector_keys=cache_connector, 33 static=True, 34 null_indices=False, 35 columns={ 36 'primary': 'user_id', 37 }, 38 dtypes={ 39 'user_id': 'uuid', 40 'username': 'string', 41 'password_hash': 'string', 42 'email': 'string', 43 'user_type': 'string', 44 'attributes': 'json', 45 }, 46 indices={ 47 'unique': 'username', 48 }, 49 ) 50 return self._users_pipe
Return the pipe used for users registration.
53def register_user( 54 self, 55 user: User, 56 debug: bool = False, 57 **kwargs: Any 58) -> mrsm.SuccessTuple: 59 """ 60 Register a new user to the users pipe. 61 """ 62 users_pipe = self.get_users_pipe() 63 user.user_id = uuid.uuid4() 64 sync_success, sync_msg = users_pipe.sync( 65 [{ 66 'user_id': user.user_id, 67 'username': user.username, 68 'email': user.email, 69 'password_hash': user.password_hash, 70 'user_type': user.type, 71 'attributes': user.attributes, 72 }], 73 check_existing=False, 74 debug=debug, 75 ) 76 if not sync_success: 77 return False, f"Failed to register user '{user.username}':\n{sync_msg}" 78 79 return True, "Success"
Register a new user to the users pipe.
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 83 """ 84 Return a user's ID from the username. 85 """ 86 users_pipe = self.get_users_pipe() 87 result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1) 88 if result_df is None or len(result_df) == 0: 89 return None 90 return result_df['user_id'][0]
Return a user's ID from the username.
93def get_username(self, user_id: Any, debug: bool = False) -> Any: 94 """ 95 Return the username from the given ID. 96 """ 97 users_pipe = self.get_users_pipe() 98 return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)
Return the username from the given ID.
101def get_users( 102 self, 103 debug: bool = False, 104 **kw: Any 105) -> List[str]: 106 """ 107 Get the registered usernames. 108 """ 109 users_pipe = self.get_users_pipe() 110 df = users_pipe.get_data() 111 if df is None: 112 return [] 113 114 return list(df['username'])
Get the registered usernames.
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 118 """ 119 Edit the attributes for an existing user. 120 """ 121 users_pipe = self.get_users_pipe() 122 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 123 124 doc = {'user_id': user_id} 125 if user.email != '': 126 doc['email'] = user.email 127 if user.password_hash != '': 128 doc['password_hash'] = user.password_hash 129 if user.type != '': 130 doc['user_type'] = user.type 131 if user.attributes: 132 doc['attributes'] = user.attributes 133 134 sync_success, sync_msg = users_pipe.sync([doc], debug=debug) 135 if not sync_success: 136 return False, f"Failed to edit user '{user.username}':\n{sync_msg}" 137 138 return True, "Success"
Edit the attributes for an existing user.
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 142 """ 143 Delete a user from the users table. 144 """ 145 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 146 users_pipe = self.get_users_pipe() 147 clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug) 148 if not clear_success: 149 return False, f"Failed to delete user '{user}':\n{clear_msg}" 150 return True, "Success"
Delete a user from the users table.
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 154 """ 155 Get a user's password hash from the users table. 156 """ 157 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 158 users_pipe = self.get_users_pipe() 159 result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug) 160 if result_df is None or len(result_df) == 0: 161 return None 162 163 return result_df['password_hash'][0]
Get a user's password hash from the users table.
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]: 167 """ 168 Get a user's type from the users table. 169 """ 170 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 171 users_pipe = self.get_users_pipe() 172 result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug) 173 if result_df is None or len(result_df) == 0: 174 return None 175 176 return result_df['user_type'][0]
Get a user's type from the users table.
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]: 180 """ 181 Get a user's attributes from the users table. 182 """ 183 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 184 users_pipe = self.get_users_pipe() 185 result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug) 186 if result_df is None or len(result_df) == 0: 187 return None 188 189 return result_df['attributes'][0]
Get a user's attributes from the users table.
16def get_plugins_pipe(self) -> 'mrsm.Pipe': 17 """ 18 Return the internal pipe for syncing plugins metadata. 19 """ 20 if '_plugins_pipe' in self.__dict__: 21 return self._plugins_pipe 22 23 cache_connector = self.__dict__.get('_cache_connector', None) 24 users_pipe = self.get_users_pipe() 25 user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid') 26 27 self._plugins_pipe = mrsm.Pipe( 28 'mrsm', 'plugins', 29 instance=self, 30 target='mrsm_plugins', 31 temporary=True, 32 cache=True, 33 cache_connector_keys=cache_connector, 34 static=True, 35 null_indices=False, 36 columns={ 37 'primary': 'plugin_name', 38 'user_id': 'user_id', 39 }, 40 dtypes={ 41 'plugin_name': 'string', 42 'user_id': user_id_dtype, 43 'attributes': 'json', 44 'version': 'string', 45 }, 46 ) 47 return self._plugins_pipe
Return the internal pipe for syncing plugins metadata.
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 51 """ 52 Register a new plugin to the plugins table. 53 """ 54 plugins_pipe = self.get_plugins_pipe() 55 users_pipe = self.get_users_pipe() 56 user_id = self.get_plugin_user_id(plugin) 57 if user_id is not None: 58 username = self.get_username(user_id, debug=debug) 59 return False, f"{plugin} is already registered to '{username}'." 60 61 doc = { 62 'plugin_name': plugin.name, 63 'version': plugin.version, 64 'attributes': plugin.attributes, 65 'user_id': plugin.user_id, 66 } 67 68 sync_success, sync_msg = plugins_pipe.sync( 69 [doc], 70 check_existing=False, 71 debug=debug, 72 ) 73 if not sync_success: 74 return False, f"Failed to register {plugin}:\n{sync_msg}" 75 76 return True, "Success"
Register a new plugin to the plugins table.
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 80 """ 81 Return the user ID for plugin's owner. 82 """ 83 plugins_pipe = self.get_plugins_pipe() 84 return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)
Return the user ID for plugin's owner.
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 106 """ 107 Delete a plugin's registration. 108 """ 109 plugin_id = self.get_plugin_id(plugin, debug=debug) 110 if plugin_id is None: 111 return False, f"{plugin} is not registered." 112 113 plugins_pipe = self.get_plugins_pipe() 114 clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug) 115 if not clear_success: 116 return False, f"Failed to delete {plugin}:\n{clear_msg}" 117 return True, "Success"
Delete a plugin's registration.
97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 98 """ 99 Return a plugin's ID. 100 """ 101 user_id = self.get_plugin_user_id(plugin, debug=debug) 102 return plugin.name if user_id is not None else None
Return a plugin's ID.
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 121 """ 122 Return the version for a plugin. 123 """ 124 plugins_pipe = self.get_plugins_pipe() 125 return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)
Return the version for a plugin.
136def get_plugins( 137 self, 138 user_id: Optional[int] = None, 139 search_term: Optional[str] = None, 140 debug: bool = False, 141 **kw: Any 142) -> List[str]: 143 """ 144 Return a list of plugin names. 145 """ 146 plugins_pipe = self.get_plugins_pipe() 147 params = {} 148 if user_id: 149 params['user_id'] = user_id 150 151 df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug) 152 if df is None: 153 return [] 154 155 docs = df.to_dict(orient='records') 156 return [ 157 plugin_name 158 for doc in docs 159 if (plugin_name := doc['plugin_name']).startswith(search_term or '') 160 ]
Return a list of plugin names.
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 88 """ 89 Return the username for plugin's owner. 90 """ 91 user_id = self.get_plugin_user_id(plugin, debug=debug) 92 if user_id is None: 93 return None 94 return self.get_username(user_id, debug=debug)
Return the username for plugin's owner.
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]: 129 """ 130 Return the attributes for a plugin. 131 """ 132 plugins_pipe = self.get_plugins_pipe() 133 return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}
Return the attributes for a plugin.
22def get_tokens_pipe(self) -> mrsm.Pipe: 23 """ 24 Return the internal pipe for tokens management. 25 """ 26 if '_tokens_pipe' in self.__dict__: 27 return self._tokens_pipe 28 29 users_pipe = self.get_users_pipe() 30 user_id_dtype = ( 31 users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid') 32 ) 33 34 cache_connector = self.__dict__.get('_cache_connector', None) 35 36 self._tokens_pipe = mrsm.Pipe( 37 'mrsm', 'tokens', 38 instance=self, 39 target='mrsm_tokens', 40 temporary=True, 41 cache=True, 42 cache_connector_keys=cache_connector, 43 static=True, 44 autotime=True, 45 null_indices=False, 46 columns={ 47 'datetime': 'creation', 48 'primary': 'id', 49 }, 50 indices={ 51 'unique': 'label', 52 'user_id': 'user_id', 53 }, 54 dtypes={ 55 'id': 'uuid', 56 'creation': 'datetime', 57 'expiration': 'datetime', 58 'is_valid': 'bool', 59 'label': 'string', 60 'user_id': user_id_dtype, 61 'scopes': 'json', 62 'secret_hash': 'string', 63 }, 64 ) 65 return self._tokens_pipe
Return the internal pipe for tokens management.
68def register_token( 69 self, 70 token: Token, 71 debug: bool = False, 72) -> mrsm.SuccessTuple: 73 """ 74 Register the new token to the tokens table. 75 """ 76 token_id, token_secret = token.generate_credentials() 77 tokens_pipe = self.get_tokens_pipe() 78 user_id = self.get_user_id(token.user) if token.user is not None else None 79 if user_id is None: 80 return False, "Cannot register a token without a user." 81 82 doc = { 83 'id': token_id, 84 'user_id': user_id, 85 'creation': datetime.now(timezone.utc), 86 'expiration': token.expiration, 87 'label': token.label, 88 'is_valid': token.is_valid, 89 'scopes': list(token.scopes) if token.scopes else [], 90 'secret_hash': hash_password( 91 str(token_secret), 92 rounds=STATIC_CONFIG['tokens']['hash_rounds'] 93 ), 94 } 95 sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug) 96 if not sync_success: 97 return False, f"Failed to register token:\n{sync_msg}" 98 return True, "Success"
Register the new token to the tokens table.
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 102 """ 103 Persist the token's in-memory state to the tokens pipe. 104 """ 105 if not token.id: 106 return False, "Token ID is not set." 107 108 if not token.exists(debug=debug): 109 return False, f"Token {token.id} does not exist." 110 111 if not token.creation: 112 token_model = self.get_token_model(token.id) 113 token.creation = token_model.creation 114 115 tokens_pipe = self.get_tokens_pipe() 116 doc = { 117 'id': token.id, 118 'creation': token.creation, 119 'expiration': token.expiration, 120 'label': token.label, 121 'is_valid': token.is_valid, 122 'scopes': list(token.scopes) if token.scopes else [], 123 } 124 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 125 if not sync_success: 126 return False, f"Failed to edit token '{token.id}':\n{sync_msg}" 127 128 return True, "Success"
Persist the token's in-memory state to the tokens pipe.
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 132 """ 133 Set `is_valid` to `False` for the given token. 134 """ 135 if not token.id: 136 return False, "Token ID is not set." 137 138 if not token.exists(debug=debug): 139 return False, f"Token {token.id} does not exist." 140 141 if not token.creation: 142 token_model = self.get_token_model(token.id) 143 token.creation = token_model.creation 144 145 token.is_valid = False 146 tokens_pipe = self.get_tokens_pipe() 147 doc = { 148 'id': token.id, 149 'creation': token.creation, 150 'is_valid': False, 151 } 152 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 153 if not sync_success: 154 return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}" 155 156 return True, "Success"
Set is_valid to False for the given token.
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 160 """ 161 Delete the given token from the tokens table. 162 """ 163 if not token.id: 164 return False, "Token ID is not set." 165 166 if not token.exists(debug=debug): 167 return False, f"Token {token.id} does not exist." 168 169 if not token.creation: 170 token_model = self.get_token_model(token.id) 171 token.creation = token_model.creation 172 173 token.is_valid = False 174 tokens_pipe = self.get_tokens_pipe() 175 clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug) 176 if not clear_success: 177 return False, f"Failed to delete token '{token.id}':\n{clear_msg}" 178 179 return True, "Success"
Delete the given token from the tokens table.
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]: 236 """ 237 Return the `Token` from its ID. 238 """ 239 from meerschaum.utils.misc import is_uuid 240 if isinstance(token_id, str): 241 if is_uuid(token_id): 242 token_id = uuid.UUID(token_id) 243 else: 244 raise ValueError("Invalid token ID.") 245 token_model = self.get_token_model(token_id) 246 if token_model is None: 247 return None 248 return Token(**dict(token_model))
Return the Token from its ID.
182def get_tokens( 183 self, 184 user: Optional[User] = None, 185 labels: Optional[List[str]] = None, 186 ids: Optional[List[uuid.UUID]] = None, 187 debug: bool = False, 188) -> List[Token]: 189 """ 190 Return a list of `Token` objects. 191 """ 192 tokens_pipe = self.get_tokens_pipe() 193 user_id = ( 194 self.get_user_id(user, debug=debug) 195 if user is not None 196 else None 197 ) 198 user_type = self.get_user_type(user, debug=debug) if user is not None else None 199 params = ( 200 { 201 'user_id': ( 202 user_id 203 if user_type != 'admin' 204 else [user_id, None] 205 ) 206 } 207 if user_id is not None 208 else {} 209 ) 210 if labels: 211 params['label'] = labels 212 if ids: 213 params['id'] = ids 214 215 if debug: 216 dprint(f"Getting tokens with {user_id=}, {params=}") 217 218 tokens_df = tokens_pipe.get_data(params=params, debug=debug) 219 if tokens_df is None: 220 return [] 221 222 if debug: 223 dprint(f"Retrieved tokens dataframe:\n{tokens_df}") 224 225 tokens_docs = tokens_df.to_dict(orient='records') 226 return [ 227 Token( 228 instance=self, 229 **token_doc 230 ) 231 for token_doc in reversed(tokens_docs) 232 ]
Return a list of Token objects.
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]': 252 """ 253 Return a token's model from the instance. 254 """ 255 from meerschaum.models import TokenModel 256 if isinstance(token_id, Token): 257 token_id = Token.id 258 if not token_id: 259 raise ValueError("Invalid token ID.") 260 tokens_pipe = self.get_tokens_pipe() 261 doc = tokens_pipe.get_doc( 262 params={'id': token_id}, 263 debug=debug, 264 ) 265 if doc is None: 266 return None 267 return TokenModel(**doc)
Return a token's model from the instance.
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]: 271 """ 272 Return the secret hash for a given token. 273 """ 274 if isinstance(token_id, Token): 275 token_id = token_id.id 276 if not token_id: 277 raise ValueError("Invalid token ID.") 278 tokens_pipe = self.get_tokens_pipe() 279 return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)
Return the secret hash for a given token.
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool: 309 """ 310 Return `True` if a token exists in the tokens pipe. 311 """ 312 if isinstance(token_id, Token): 313 token_id = token_id.id 314 if not token_id: 315 raise ValueError("Invalid token ID.") 316 317 tokens_pipe = self.get_tokens_pipe() 318 return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None
Return True if a token exists in the tokens pipe.
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]: 296 """ 297 Return the scopes for a token. 298 """ 299 if isinstance(token_id, Token): 300 token_id = token_id.id 301 if not token_id: 302 raise ValueError("Invalid token ID.") 303 304 tokens_pipe = self.get_tokens_pipe() 305 return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []
Return the scopes for a token.
17@abc.abstractmethod 18def register_pipe( 19 self, 20 pipe: mrsm.Pipe, 21 debug: bool = False, 22 **kwargs: Any 23) -> mrsm.SuccessTuple: 24 """ 25 Insert the pipe's attributes into the internal `pipes` table. 26 27 Parameters 28 ---------- 29 pipe: mrsm.Pipe 30 The pipe to be registered. 31 32 Returns 33 ------- 34 A `SuccessTuple` of the result. 35 """
Insert the pipe's attributes into the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe to be registered.
Returns
- A
SuccessTupleof the result.
37@abc.abstractmethod 38def get_pipe_attributes( 39 self, 40 pipe: mrsm.Pipe, 41 debug: bool = False, 42 **kwargs: Any 43) -> Dict[str, Any]: 44 """ 45 Return the pipe's document from the internal `pipes` table. 46 47 Parameters 48 ---------- 49 pipe: mrsm.Pipe 50 The pipe whose attributes should be retrieved. 51 52 Returns 53 ------- 54 The document that matches the keys of the pipe. 55 """
Return the pipe's document from the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
- The document that matches the keys of the pipe.
57@abc.abstractmethod 58def get_pipe_id( 59 self, 60 pipe: mrsm.Pipe, 61 debug: bool = False, 62 **kwargs: Any 63) -> Union[str, int, None]: 64 """ 65 Return the `id` for the pipe if it exists. 66 67 Parameters 68 ---------- 69 pipe: mrsm.Pipe 70 The pipe whose `id` to fetch. 71 72 Returns 73 ------- 74 The `id` for the pipe's document or `None`. 75 """
Return the id for the pipe if it exists.
Parameters
- pipe (mrsm.Pipe):
The pipe whose
idto fetch.
Returns
- The
idfor the pipe's document orNone.
77def edit_pipe( 78 self, 79 pipe: mrsm.Pipe, 80 debug: bool = False, 81 **kwargs: Any 82) -> mrsm.SuccessTuple: 83 """ 84 Edit the attributes of the pipe. 85 86 Parameters 87 ---------- 88 pipe: mrsm.Pipe 89 The pipe whose in-memory parameters must be persisted. 90 91 Returns 92 ------- 93 A `SuccessTuple` indicating success. 94 """ 95 raise NotImplementedError
Edit the attributes of the pipe.
Parameters
- pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
- A
SuccessTupleindicating success.
97def delete_pipe( 98 self, 99 pipe: mrsm.Pipe, 100 debug: bool = False, 101 **kwargs: Any 102) -> mrsm.SuccessTuple: 103 """ 104 Delete a pipe's registration from the `pipes` collection. 105 106 Parameters 107 ---------- 108 pipe: mrsm.Pipe 109 The pipe to be deleted. 110 111 Returns 112 ------- 113 A `SuccessTuple` indicating success. 114 """ 115 raise NotImplementedError
Delete a pipe's registration from the pipes collection.
Parameters
- pipe (mrsm.Pipe): The pipe to be deleted.
Returns
- A
SuccessTupleindicating success.
117@abc.abstractmethod 118def fetch_pipes_keys( 119 self, 120 connector_keys: Optional[List[str]] = None, 121 metric_keys: Optional[List[str]] = None, 122 location_keys: Optional[List[str]] = None, 123 tags: Optional[List[str]] = None, 124 debug: bool = False, 125 **kwargs: Any 126) -> List[Tuple[str, str, str]]: 127 """ 128 Return a list of tuples for the registered pipes' keys according to the provided filters. 129 130 Parameters 131 ---------- 132 connector_keys: list[str] | None, default None 133 The keys passed via `-c`. 134 135 metric_keys: list[str] | None, default None 136 The keys passed via `-m`. 137 138 location_keys: list[str] | None, default None 139 The keys passed via `-l`. 140 141 tags: List[str] | None, default None 142 Tags passed via `--tags` which are stored under `parameters:tags`. 143 144 Returns 145 ------- 146 A list of connector, metric, and location keys in tuples. 147 You may return the string "None" for location keys in place of nulls. 148 149 Examples 150 -------- 151 >>> import meerschaum as mrsm 152 >>> conn = mrsm.get_connector('example:demo') 153 >>> 154 >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn) 155 >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn) 156 >>> pipe_a.register() 157 >>> pipe_b.register() 158 >>> 159 >>> conn.fetch_pipes_keys(['a', 'b']) 160 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 161 >>> conn.fetch_pipes_keys(metric_keys=['demo']) 162 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 163 >>> conn.fetch_pipes_keys(tags=['foo']) 164 [('a', 'demo', 'None')] 165 >>> conn.fetch_pipes_keys(location_keys=[None]) 166 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 167 """
Return a list of tuples for the registered pipes' keys according to the provided filters.
Parameters
- connector_keys (list[str] | None, default None):
The keys passed via
-c. - metric_keys (list[str] | None, default None):
The keys passed via
-m. - location_keys (list[str] | None, default None):
The keys passed via
-l. - tags (List[str] | None, default None):
Tags passed via
--tagswhich are stored underparameters:tags.
Returns
- A list of connector, metric, and location keys in tuples.
- You may return the string "None" for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
169@abc.abstractmethod 170def pipe_exists( 171 self, 172 pipe: mrsm.Pipe, 173 debug: bool = False, 174 **kwargs: Any 175) -> bool: 176 """ 177 Check whether a pipe's target table exists. 178 179 Parameters 180 ---------- 181 pipe: mrsm.Pipe 182 The pipe to check whether its table exists. 183 184 Returns 185 ------- 186 A `bool` indicating the table exists. 187 """
Check whether a pipe's target table exists.
Parameters
- pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
- A
boolindicating the table exists.
189@abc.abstractmethod 190def drop_pipe( 191 self, 192 pipe: mrsm.Pipe, 193 debug: bool = False, 194 **kwargs: Any 195) -> mrsm.SuccessTuple: 196 """ 197 Drop a pipe's collection if it exists. 198 199 Parameters 200 ---------- 201 pipe: mrsm.Pipe 202 The pipe to be dropped. 203 204 Returns 205 ------- 206 A `SuccessTuple` indicating success. 207 """ 208 raise NotImplementedError
Drop a pipe's collection if it exists.
Parameters
- pipe (mrsm.Pipe): The pipe to be dropped.
Returns
- A
SuccessTupleindicating success.
210def drop_pipe_indices( 211 self, 212 pipe: mrsm.Pipe, 213 debug: bool = False, 214 **kwargs: Any 215) -> mrsm.SuccessTuple: 216 """ 217 Drop a pipe's indices. 218 219 Parameters 220 ---------- 221 pipe: mrsm.Pipe 222 The pipe whose indices need to be dropped. 223 224 Returns 225 ------- 226 A `SuccessTuple` indicating success. 227 """ 228 return False, f"Cannot drop indices for instance connectors of type '{self.type}'."
Drop a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
- A
SuccessTupleindicating success.
230@abc.abstractmethod 231def sync_pipe( 232 self, 233 pipe: mrsm.Pipe, 234 df: 'pd.DataFrame' = None, 235 begin: Union[datetime, int, None] = None, 236 end: Union[datetime, int, None] = None, 237 chunksize: Optional[int] = -1, 238 check_existing: bool = True, 239 debug: bool = False, 240 **kwargs: Any 241) -> mrsm.SuccessTuple: 242 """ 243 Sync a pipe using a database connection. 244 245 Parameters 246 ---------- 247 pipe: mrsm.Pipe 248 The Meerschaum Pipe instance into which to sync the data. 249 250 df: Optional[pd.DataFrame] 251 An optional DataFrame or equivalent to sync into the pipe. 252 Defaults to `None`. 253 254 begin: Union[datetime, int, None], default None 255 Optionally specify the earliest datetime to search for data. 256 Defaults to `None`. 257 258 end: Union[datetime, int, None], default None 259 Optionally specify the latest datetime to search for data. 260 Defaults to `None`. 261 262 chunksize: Optional[int], default -1 263 Specify the number of rows to sync per chunk. 264 If `-1`, resort to system configuration (default is `900`). 265 A `chunksize` of `None` will sync all rows in one transaction. 266 Defaults to `-1`. 267 268 check_existing: bool, default True 269 If `True`, pull and diff with existing data from the pipe. Defaults to `True`. 270 271 debug: bool, default False 272 Verbosity toggle. Defaults to False. 273 274 Returns 275 ------- 276 A `SuccessTuple` of success (`bool`) and message (`str`). 277 """
Sync a pipe using a database connection.
Parameters
- pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
- df (Optional[pd.DataFrame]):
An optional DataFrame or equivalent to sync into the pipe.
Defaults to
None. - begin (Union[datetime, int, None], default None):
Optionally specify the earliest datetime to search for data.
Defaults to
None. - end (Union[datetime, int, None], default None):
Optionally specify the latest datetime to search for data.
Defaults to
None. - chunksize (Optional[int], default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. Defaults to-1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. Defaults toTrue. - debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
279def create_pipe_indices( 280 self, 281 pipe: mrsm.Pipe, 282 debug: bool = False, 283 **kwargs: Any 284) -> mrsm.SuccessTuple: 285 """ 286 Create a pipe's indices. 287 288 Parameters 289 ---------- 290 pipe: mrsm.Pipe 291 The pipe whose indices need to be created. 292 293 Returns 294 ------- 295 A `SuccessTuple` indicating success. 296 """ 297 return False, f"Cannot create indices for instance connectors of type '{self.type}'."
Create a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
- A
SuccessTupleindicating success.
299def clear_pipe( 300 self, 301 pipe: mrsm.Pipe, 302 begin: Union[datetime, int, None] = None, 303 end: Union[datetime, int, None] = None, 304 params: Optional[Dict[str, Any]] = None, 305 debug: bool = False, 306 **kwargs: Any 307) -> mrsm.SuccessTuple: 308 """ 309 Delete rows within `begin`, `end`, and `params`. 310 311 Parameters 312 ---------- 313 pipe: mrsm.Pipe 314 The pipe whose rows to clear. 315 316 begin: datetime | int | None, default None 317 If provided, remove rows >= `begin`. 318 319 end: datetime | int | None, default None 320 If provided, remove rows < `end`. 321 322 params: dict[str, Any] | None, default None 323 If provided, only remove rows which match the `params` filter. 324 325 Returns 326 ------- 327 A `SuccessTuple` indicating success. 328 """ 329 raise NotImplementedError
Delete rows within begin, end, and params.
Parameters
- pipe (mrsm.Pipe): The pipe whose rows to clear.
- begin (datetime | int | None, default None):
If provided, remove rows >=
begin. - end (datetime | int | None, default None):
If provided, remove rows <
end. - params (dict[str, Any] | None, default None):
If provided, only remove rows which match the
paramsfilter.
Returns
- A
SuccessTupleindicating success.
331@abc.abstractmethod 332def get_pipe_data( 333 self, 334 pipe: mrsm.Pipe, 335 select_columns: Optional[List[str]] = None, 336 omit_columns: Optional[List[str]] = None, 337 begin: Union[datetime, int, None] = None, 338 end: Union[datetime, int, None] = None, 339 params: Optional[Dict[str, Any]] = None, 340 debug: bool = False, 341 **kwargs: Any 342) -> Union['pd.DataFrame', None]: 343 """ 344 Query a pipe's target table and return the DataFrame. 345 346 Parameters 347 ---------- 348 pipe: mrsm.Pipe 349 The pipe with the target table from which to read. 350 351 select_columns: list[str] | None, default None 352 If provided, only select these given columns. 353 Otherwise select all available columns (i.e. `SELECT *`). 354 355 omit_columns: list[str] | None, default None 356 If provided, remove these columns from the selection. 357 358 begin: datetime | int | None, default None 359 The earliest `datetime` value to search from (inclusive). 360 361 end: datetime | int | None, default None 362 The lastest `datetime` value to search from (exclusive). 363 364 params: dict[str | str] | None, default None 365 Additional filters to apply to the query. 366 367 Returns 368 ------- 369 The target table's data as a DataFrame. 370 """
Query a pipe's target table and return the DataFrame.
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetimevalue to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetimevalue to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a DataFrame.
372@abc.abstractmethod 373def get_sync_time( 374 self, 375 pipe: mrsm.Pipe, 376 params: Optional[Dict[str, Any]] = None, 377 newest: bool = True, 378 debug: bool = False, 379 **kwargs: Any 380) -> datetime | int | None: 381 """ 382 Return the most recent value for the `datetime` axis. 383 384 Parameters 385 ---------- 386 pipe: mrsm.Pipe 387 The pipe whose collection contains documents. 388 389 params: dict[str, Any] | None, default None 390 Filter certain parameters when determining the sync time. 391 392 newest: bool, default True 393 If `True`, return the maximum value for the column. 394 395 Returns 396 ------- 397 The largest `datetime` or `int` value of the `datetime` axis. 398 """
Return the most recent value for the datetime axis.
Parameters
- pipe (mrsm.Pipe): The pipe whose collection contains documents.
- params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
- newest (bool, default True):
If
True, return the maximum value for the column.
Returns
- The largest
datetimeorintvalue of thedatetimeaxis.
400@abc.abstractmethod 401def get_pipe_columns_types( 402 self, 403 pipe: mrsm.Pipe, 404 debug: bool = False, 405 **kwargs: Any 406) -> Dict[str, str]: 407 """ 408 Return the data types for the columns in the target table for data type enforcement. 409 410 Parameters 411 ---------- 412 pipe: mrsm.Pipe 413 The pipe whose target table contains columns and data types. 414 415 Returns 416 ------- 417 A dictionary mapping columns to data types. 418 """
Return the data types for the columns in the target table for data type enforcement.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
- A dictionary mapping columns to data types.
420def get_pipe_columns_indices( 421 self, 422 debug: bool = False, 423) -> Dict[str, List[Dict[str, str]]]: 424 """ 425 Return a dictionary mapping columns to metadata about related indices. 426 427 Parameters 428 ---------- 429 pipe: mrsm.Pipe 430 The pipe whose target table has related indices. 431 432 Returns 433 ------- 434 A list of dictionaries with the keys "type" and "name". 435 436 Examples 437 -------- 438 >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']}) 439 >>> pipe.sync([{'color': 'red', 'size': 'M'}]) 440 >>> pipe.get_columns_indices() 441 {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]} 442 """ 443 return {}
Return a dictionary mapping columns to metadata about related indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
- A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
279def make_connector(cls, _is_executor: bool = False): 280 """ 281 Register a class as a `Connector`. 282 The `type` will be the lower case of the class name, without the suffix `connector`. 283 284 Parameters 285 ---------- 286 instance: bool, default False 287 If `True`, make this connector type an instance connector. 288 This requires implementing the various pipes functions and lots of testing. 289 290 Examples 291 -------- 292 >>> import meerschaum as mrsm 293 >>> from meerschaum.connectors import make_connector, Connector 294 >>> 295 >>> @make_connector 296 >>> class FooConnector(Connector): 297 ... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password'] 298 ... 299 >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat') 300 >>> print(conn.username, conn.password) 301 dog cat 302 >>> 303 """ 304 import re 305 from meerschaum.plugins import _get_parent_plugin 306 suffix_regex = ( 307 r'connector$' 308 if not _is_executor 309 else r'executor$' 310 ) 311 plugin_name = _get_parent_plugin(2) 312 typ = re.sub(suffix_regex, '', cls.__name__.lower()) 313 with _locks['types']: 314 types[typ] = cls 315 with _locks['custom_types']: 316 custom_types.add(typ) 317 if plugin_name: 318 with _locks['plugins_types']: 319 if plugin_name not in plugins_types: 320 plugins_types[plugin_name] = [] 321 plugins_types[plugin_name].append(typ) 322 with _locks['connectors']: 323 if typ not in connectors: 324 connectors[typ] = {} 325 if getattr(cls, 'IS_INSTANCE', False): 326 with _locks['instance_types']: 327 if typ not in instance_types: 328 instance_types.append(typ) 329 330 return cls
Register a class as a Connector.
The type will be the lower case of the class name, without the suffix connector.
Parameters
- instance (bool, default False):
If
True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>>
>>> @make_connector
>>> class FooConnector(Connector):
... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
...
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
53def entry( 54 sysargs: Union[List[str], str, None] = None, 55 _patch_args: Optional[Dict[str, Any]] = None, 56 _use_cli_daemon: bool = True, 57 _session_id: Optional[str] = None, 58) -> SuccessTuple: 59 """ 60 Parse arguments and launch a Meerschaum action. 61 62 Returns 63 ------- 64 A `SuccessTuple` indicating success. 65 """ 66 start = time.perf_counter() 67 from meerschaum.config.environment import get_daemon_env_vars 68 sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs 69 if ( 70 not _use_cli_daemon 71 or (not sysargs or (sysargs[0] and sysargs[0].startswith('-'))) 72 or '--no-daemon' in sysargs_list 73 or '--daemon' in sysargs_list 74 or '-d' in sysargs_list 75 or get_daemon_env_vars() 76 or not mrsm.get_config('system', 'experimental', 'cli_daemon') 77 ): 78 success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args) 79 end = time.perf_counter() 80 if '--debug' in sysargs_list: 81 print(f"Duration without daemon: {round(end - start, 3)}") 82 return success, msg 83 84 from meerschaum._internal.cli.entry import entry_with_daemon 85 success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args) 86 end = time.perf_counter() 87 if '--debug' in sysargs_list: 88 print(f"Duration with daemon: {round(end - start, 3)}") 89 return success, msg