meerschaum
Meerschaum Python API
Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum
package. Visit meerschaum.io for general usage documentation.
Root Module
For your convenience, the following classes and functions may be imported from the root meerschaum
namespace:
Classes
Examples
Build a Connector
Get existing connectors or build a new one in-memory with the meerschaum.get_connector()
factory function:
import meerschaum as mrsm
sql_conn = mrsm.get_connector(
'sql:temp',
flavor='sqlite',
database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
# foo
# 0 1
sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
# foo
# 0 1
Create a Custom Connector Class
Decorate your connector classes with meerschaum.make_connector()
to designate it as a custom connector:
from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time
@mrsm.make_connector
class FooConnector(mrsm.Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']
def fetch(
self,
begin: datetime | None = None,
end: datetime | None = None,
):
now = begin or round_time(datetime.now(timezone.utc))
return [
{'ts': now, 'id': 1, 'vl': randint(1, 100)},
{'ts': now, 'id': 2, 'vl': randint(1, 100)},
{'ts': now, 'id': 3, 'vl': randint(1, 100)},
]
foo_conn = mrsm.get_connector(
'foo:bar',
username='foo',
password='bar',
)
docs = foo_conn.fetch()
Build a Pipe
Build a meerschaum.Pipe
in-memory:
from datetime import datetime
import meerschaum as mrsm
pipe = mrsm.Pipe(
foo_conn, 'demo',
instance=sql_conn,
columns={'datetime': 'ts', 'id': 'id'},
tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
# ts id vl
# 0 2024-01-01 1 97
# 1 2024-01-01 2 18
# 2 2024-01-01 3 96
Add temporary=True
to skip registering the pipe in the pipes table.
Get Registered Pipes
The meerschaum.get_pipes()
function returns a dictionary hierarchy of pipes by connector, metric, and location:
import meerschaum as mrsm
pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]
Add as_list=True
to flatten the hierarchy:
import meerschaum as mrsm
pipes = mrsm.get_pipes(
tags=['production'],
instance=sql_conn,
as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]
Import Plugins
You can import a plugin's module through meerschaum.Plugin.module
:
import meerschaum as mrsm
plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
noaa = plugin.module
If your plugin has submodules, use meerschaum.plugins.from_plugin_import
:
from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')
Import multiple plugins with meerschaum.plugins.import_plugins
:
from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')
Create a Job
Create a meerschaum.Job
with name
and sysargs
:
import meerschaum as mrsm
job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()
Pass executor_keys
as the connectors keys of an API instance to create a remote job:
import meerschaum as mrsm
job = mrsm.Job(
'foo',
'sync pipes -s daily',
executor_keys='api:main',
)
Import from a Virtual Environment
Use the meerschaum.Venv
context manager to activate a virtual environment:
import meerschaum as mrsm
with mrsm.Venv('noaa'):
import requests
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
To import packages which may not be installed, use meerschaum.attempt_import()
:
import meerschaum as mrsm
requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
Run Actions
Run sysargs
with meerschaum.entry()
:
import meerschaum as mrsm
success, msg = mrsm.entry('show pipes + show version : x2')
Use meerschaum.actions.get_action()
to access an action function directly:
from meerschaum.actions import get_action
show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])
Get a dictionary of available subactions with meerschaum.actions.get_subactions()
:
from meerschaum.actions import get_subactions
subactions = get_subactions('show')
success, msg = subactions['pipes']()
Create a Plugin
Run bootstrap plugin
to create a new plugin:
mrsm bootstrap plugin example
This will create example.py
in your plugins directory (default ~/.config/meerschaum/plugins/
, Windows: %APPDATA%\Meerschaum\plugins
). You may paste the example code from the "Create a Custom Action" example below.
Open your plugin with edit plugin
:
mrsm edit plugin example
Run edit plugin
and paste the example code below to try out the features.
See the writing plugins guide for more in-depth documentation.
Create a Custom Action
Decorate a function with meerschaum.actions.make_action
to designate it as an action. Subactions will be automatically detected if not decorated:
from meerschaum.actions import make_action
@make_action
def sing():
print('What would you like me to sing?')
return True, "Success"
def sing_tune():
return False, "I don't know that song!"
def sing_song():
print('Hello, World!')
return True, "Success"
Use meerschaum.plugins.add_plugin_argument()
to create new parameters for your action:
from meerschaum.plugins import make_action, add_plugin_argument
add_plugin_argument(
'--song', type=str, help='What song to sing.',
)
@make_action
def sing_melody(action=None, song=None):
to_sing = action[0] if action else song
if not to_sing:
return False, "Please tell me what to sing!"
return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala
mrsm sing melody --song do-re-mi
Add a Page to the Web Dashboard
Use the decorators meerschaum.plugins.dash_plugin()
and meerschaum.plugins.web_page()
to add new pages to the web dashboard:
from meerschaum.plugins import dash_plugin, web_page
@dash_plugin
def init_dash(dash_app):
import dash.html as html
import dash_bootstrap_components as dbc
from dash import Input, Output, no_update
### Routes to '/dash/my-page'
@web_page('/my-page', login_required=False)
def my_page():
return dbc.Container([
html.H1("Hello, World!"),
dbc.Button("Click me", id='my-button'),
html.Div(id="my-output-div"),
])
@dash_app.callback(
Output('my-output-div', 'children'),
Input('my-button', 'n_clicks'),
)
def my_button_click(n_clicks):
if not n_clicks:
return no_update
return html.P(f'You clicked {n_clicks} times!')
Submodules
meerschaum.actions
Access functions for actions and subactions.
meerschaum.actions.actions
meerschaum.actions.get_action()
meerschaum.actions.get_completer()
meerschaum.actions.get_main_action_name()
meerschaum.actions.get_subactions()
meerschaum.config
Read and write the Meerschaum configuration registry.
meerschaum.config.get_config()
meerschaum.config.get_plugin_config()
meerschaum.config.write_config()
meerschaum.config.write_plugin_config()
meerschaum.connectors
Build connectors to interact with databases and fetch data.
meerschaum.connectors.get_connector()
meerschaum.connectors.make_connector()
meerschaum.connectors.is_connected()
meerschaum.connectors.poll.retry_connect()
meerschaum.connectors.Connector
meerschaum.connectors.sql.SQLConnector
meerschaum.connectors.api.APIConnector
meerschaum.connectors.valkey.ValkeyConnector
meerschaum.jobs
Start background jobs.
meerschaum.jobs.Job
meerschaum.jobs.Executor
meerschaum.jobs.systemd.SystemdExecutor
meerschaum.jobs.get_jobs()
meerschaum.jobs.get_filtered_jobs()
meerschaum.jobs.get_running_jobs()
meerschaum.jobs.get_stopped_jobs()
meerschaum.jobs.get_paused_jobs()
meerschaum.jobs.get_restart_jobs()
meerschaum.jobs.make_executor()
meerschaum.jobs.check_restart_jobs()
meerschaum.jobs.start_check_jobs_thread()
meerschaum.jobs.stop_check_jobs_thread()
meerschaum.plugins
Access plugin modules and other API utilties.
meerschaum.plugins.Plugin
meerschaum.plugins.api_plugin()
meerschaum.plugins.dash_plugin()
meerschaum.plugins.import_plugins()
meerschaum.plugins.reload_plugins()
meerschaum.plugins.get_plugins()
meerschaum.plugins.get_data_plugins()
meerschaum.plugins.add_plugin_argument()
meerschaum.plugins.pre_sync_hook()
meerschaum.plugins.post_sync_hook()
meerschaum.utils
Utility functions are available in several submodules:
meerschaum.utils.daemon.daemon_entry()
meerschaum.utils.daemon.daemon_action()
meerschaum.utils.daemon.get_daemons()
meerschaum.utils.daemon.get_daemon_ids()
meerschaum.utils.daemon.get_running_daemons()
meerschaum.utils.daemon.get_paused_daemons()
meerschaum.utils.daemon.get_stopped_daemons()
meerschaum.utils.daemon.get_filtered_daemons()
meerschaum.utils.daemon.run_daemon()
meerschaum.utils.daemon.Daemon
meerschaum.utils.daemon.FileDescriptorInterceptor
meerschaum.utils.daemon.RotatingFile
meerschaum.utils.daemon
Manage background jobs.
meerschaum.utils.dataframe.add_missing_cols_to_df()
meerschaum.utils.dataframe.chunksize_to_npartitions()
meerschaum.utils.dataframe.df_from_literal()
meerschaum.utils.dataframe.df_is_chunk_generator()
meerschaum.utils.dataframe.enforce_dtypes()
meerschaum.utils.dataframe.filter_unseen_df()
meerschaum.utils.dataframe.get_bool_cols()
meerschaum.utils.dataframe.get_bytes_cols()
meerschaum.utils.dataframe.get_datetime_bound_from_df()
meerschaum.utils.dataframe.get_datetime_cols()
meerschaum.utils.dataframe.get_datetime_cols_types()
meerschaum.utils.dataframe.get_first_valid_dask_partition()
meerschaum.utils.dataframe.get_geometry_cols()
meerschaum.utils.dataframe.get_geometry_cols_types()
meerschaum.utils.dataframe.get_json_cols()
meerschaum.utils.dataframe.get_numeric_cols()
meerschaum.utils.dataframe.get_special_cols()
meerschaum.utils.dataframe.get_unhashable_cols()
meerschaum.utils.dataframe.get_unique_index_values()
meerschaum.utils.dataframe.get_uuid_cols()
meerschaum.utils.dataframe.parse_df_datetimes()
meerschaum.utils.dataframe.query_df()
meerschaum.utils.dataframe.to_json()
meerschaum.utils.dataframe
Manipulate dataframes.
meerschaum.utils.dtypes.are_dtypes_equal()
meerschaum.utils.dtypes.attempt_cast_to_bytes()
meerschaum.utils.dtypes.attempt_cast_to_geometry()
meerschaum.utils.dtypes.attempt_cast_to_numeric()
meerschaum.utils.dtypes.attempt_cast_to_uuid()
meerschaum.utils.dtypes.coerce_timezone()
meerschaum.utils.dtypes.deserialize_base64()
meerschaum.utils.dtypes.deserialize_bytes_string()
meerschaum.utils.dtypes.deserialize_geometry()
meerschaum.utils.dtypes.encode_bytes_for_bytea()
meerschaum.utils.dtypes.geometry_is_wkt()
meerschaum.utils.dtypes.get_current_timestamp()
meerschaum.utils.dtypes.get_geometry_type_srid()
meerschaum.utils.dtypes.is_dtype_numeric()
meerschaum.utils.dtypes.is_dtype_special()
meerschaum.utils.dtypes.json_serialize_value()
meerschaum.utils.dtypes.none_if_null()
meerschaum.utils.dtypes.project_geometry()
meerschaum.utils.dtypes.quantize_decimal()
meerschaum.utils.dtypes.serialize_bytes()
meerschaum.utils.dtypes.serialize_datetime()
meerschaum.utils.dtypes.serialize_date()
meerschaum.utils.dtypes.serialize_decimal()
meerschaum.utils.dtypes.serialize_geometry()
meerschaum.utils.dtypes.to_datetime()
meerschaum.utils.dtypes.to_pandas_dtype()
meerschaum.utils.dtypes.value_is_null()
meerschaum.utils.dtypes.get_current_timestamp()
meerschaum.utils.dtypes.get_next_precision_unit()
meerschaum.utils.dtypes.round_time()
meerschaum.utils.dtypes
Work with data types.
meerschaum.utils.formatting.colored()
meerschaum.utils.formatting.extract_stats_from_message()
meerschaum.utils.formatting.fill_ansi()
meerschaum.utils.formatting.get_console()
meerschaum.utils.formatting.highlight_pipes()
meerschaum.utils.formatting.make_header()
meerschaum.utils.formatting.pipe_repr()
meerschaum.utils.formatting.pprint()
meerschaum.utils.formatting.pprint_pipes()
meerschaum.utils.formatting.print_options()
meerschaum.utils.formatting.print_pipes_results()
meerschaum.utils.formatting.print_tuple()
meerschaum.utils.formatting.translate_rich_to_termcolor()
meerschaum.utils.formatting
Format output text.
meerschaum.utils.misc.items_str()
meerschaum.utils.misc.is_int()
meerschaum.utils.misc.interval_str()
meerschaum.utils.misc.filter_keywords()
meerschaum.utils.misc.generate_password()
meerschaum.utils.misc.string_to_dict()
meerschaum.utils.misc.iterate_chunks()
meerschaum.utils.misc.timed_input()
meerschaum.utils.misc.replace_pipes_in_dict()
meerschaum.utils.misc.is_valid_email()
meerschaum.utils.misc.string_width()
meerschaum.utils.misc.replace_password()
meerschaum.utils.misc.parse_config_substitution()
meerschaum.utils.misc.edit_file()
meerschaum.utils.misc.get_in_ex_params()
meerschaum.utils.misc.separate_negation_values()
meerschaum.utils.misc.flatten_list()
meerschaum.utils.misc.make_symlink()
meerschaum.utils.misc.is_symlink()
meerschaum.utils.misc.wget()
meerschaum.utils.misc.add_method_to_class()
meerschaum.utils.misc.is_pipe_registered()
meerschaum.utils.misc.get_cols_lines()
meerschaum.utils.misc.sorted_dict()
meerschaum.utils.misc.flatten_pipes_dict()
meerschaum.utils.misc.dict_from_od()
meerschaum.utils.misc.remove_ansi()
meerschaum.utils.misc.get_connector_labels()
meerschaum.utils.misc.json_serialize_datetime()
meerschaum.utils.misc.async_wrap()
meerschaum.utils.misc.is_docker_available()
meerschaum.utils.misc.is_android()
meerschaum.utils.misc.is_bcp_available()
meerschaum.utils.misc.truncate_string_sections()
meerschaum.utils.misc.safely_extract_tar()
meerschaum.utils.misc
Miscellaneous utility functions.
meerschaum.utils.packages.attempt_import()
meerschaum.utils.packages.get_module_path()
meerschaum.utils.packages.manually_import_module()
meerschaum.utils.packages.get_install_no_version()
meerschaum.utils.packages.determine_version()
meerschaum.utils.packages.need_update()
meerschaum.utils.packages.get_pip()
meerschaum.utils.packages.pip_install()
meerschaum.utils.packages.pip_uninstall()
meerschaum.utils.packages.completely_uninstall_package()
meerschaum.utils.packages.run_python_package()
meerschaum.utils.packages.lazy_import()
meerschaum.utils.packages.pandas_name()
meerschaum.utils.packages.import_pandas()
meerschaum.utils.packages.import_rich()
meerschaum.utils.packages.import_dcc()
meerschaum.utils.packages.import_html()
meerschaum.utils.packages.get_modules_from_package()
meerschaum.utils.packages.import_children()
meerschaum.utils.packages.reload_package()
meerschaum.utils.packages.reload_meerschaum()
meerschaum.utils.packages.is_installed()
meerschaum.utils.packages.venv_contains_package()
meerschaum.utils.packages.package_venv()
meerschaum.utils.packages.ensure_readline()
meerschaum.utils.packages.get_prerelease_dependencies()
meerschaum.utils.packages
Manage Python packages.
meerschaum.utils.sql.build_where()
meerschaum.utils.sql.clean()
meerschaum.utils.sql.dateadd_str()
meerschaum.utils.sql.test_connection()
meerschaum.utils.sql.get_distinct_col_count()
meerschaum.utils.sql.sql_item_name()
meerschaum.utils.sql.pg_capital()
meerschaum.utils.sql.oracle_capital()
meerschaum.utils.sql.truncate_item_name()
meerschaum.utils.sql.table_exists()
meerschaum.utils.sql.get_table_cols_types()
meerschaum.utils.sql.get_update_queries()
meerschaum.utils.sql.get_null_replacement()
meerschaum.utils.sql.get_db_version()
meerschaum.utils.sql.get_rename_table_queries()
meerschaum.utils.sql.get_create_table_queries()
meerschaum.utils.sql.wrap_query_with_cte()
meerschaum.utils.sql.format_cte_subquery()
meerschaum.utils.sql.session_execute()
meerschaum.utils.sql.get_reset_autoincrement_queries()
meerschaum.utils.sql
Build SQL queries.
meerschaum.utils.venv.Venv
meerschaum.utils.venv.activate_venv()
meerschaum.utils.venv.deactivate_venv()
meerschaum.utils.venv.get_module_venv()
meerschaum.utils.venv.get_venvs()
meerschaum.utils.venv.init_venv()
meerschaum.utils.venv.inside_venv()
meerschaum.utils.venv.is_venv_active()
meerschaum.utils.venv.venv_exec()
meerschaum.utils.venv.venv_executable()
meerschaum.utils.venv.venv_exists()
meerschaum.utils.venv.venv_target_path()
meerschaum.utils.venv.verify_venv()
meerschaum.utils.venv
Manage virtual environments.
meerschaum.utils.warnings
Print warnings, errors, info, and debug messages.
1#! /usr/bin/env python 2# -*- coding: utf-8 -*- 3# vim:fenc=utf-8 4 5""" 6Copyright 2025 Bennett Meares 7 8Licensed under the Apache License, Version 2.0 (the "License"); 9you may not use this file except in compliance with the License. 10You may obtain a copy of the License at 11 12 http://www.apache.org/licenses/LICENSE-2.0 13 14Unless required by applicable law or agreed to in writing, software 15distributed under the License is distributed on an "AS IS" BASIS, 16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17See the License for the specific language governing permissions and 18limitations under the License. 19""" 20 21import atexit 22 23from meerschaum.utils.typing import SuccessTuple 24from meerschaum.utils.packages import attempt_import 25from meerschaum.core.Pipe import Pipe 26from meerschaum.plugins import Plugin 27from meerschaum.utils.venv import Venv 28from meerschaum.jobs import Job, make_executor 29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector 30from meerschaum.utils import get_pipes 31from meerschaum.utils.formatting import pprint 32from meerschaum._internal.docs import index as __doc__ 33from meerschaum.config import __version__, get_config 34from meerschaum._internal.entry import entry 35from meerschaum.__main__ import _close_pools 36 37atexit.register(_close_pools) 38 39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False} 40__all__ = ( 41 "get_pipes", 42 "get_connector", 43 "get_config", 44 "Pipe", 45 "Plugin", 46 "SuccessTuple", 47 "Venv", 48 "Plugin", 49 "Job", 50 "pprint", 51 "attempt_import", 52 "actions", 53 "config", 54 "connectors", 55 "jobs", 56 "plugins", 57 "utils", 58 "SuccessTuple", 59 "Connector", 60 "InstanceConnector", 61 "make_connector", 62 "entry", 63)
29def get_pipes( 30 connector_keys: Union[str, List[str], None] = None, 31 metric_keys: Union[str, List[str], None] = None, 32 location_keys: Union[str, List[str], None] = None, 33 tags: Optional[List[str]] = None, 34 params: Optional[Dict[str, Any]] = None, 35 mrsm_instance: Union[str, InstanceConnector, None] = None, 36 instance: Union[str, InstanceConnector, None] = None, 37 as_list: bool = False, 38 as_tags_dict: bool = False, 39 method: str = 'registered', 40 workers: Optional[int] = None, 41 debug: bool = False, 42 _cache_parameters: bool = True, 43 **kw: Any 44) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]: 45 """ 46 Return a dictionary or list of `meerschaum.Pipe` objects. 47 48 Parameters 49 ---------- 50 connector_keys: Union[str, List[str], None], default None 51 String or list of connector keys. 52 If omitted or is `'*'`, fetch all possible keys. 53 If a string begins with `'_'`, select keys that do NOT match the string. 54 55 metric_keys: Union[str, List[str], None], default None 56 String or list of metric keys. See `connector_keys` for formatting. 57 58 location_keys: Union[str, List[str], None], default None 59 String or list of location keys. See `connector_keys` for formatting. 60 61 tags: Optional[List[str]], default None 62 If provided, only include pipes with these tags. 63 64 params: Optional[Dict[str, Any]], default None 65 Dictionary of additional parameters to search by. 66 Params are parsed into a SQL WHERE clause. 67 E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'` 68 69 mrsm_instance: Union[str, InstanceConnector, None], default None 70 Connector keys for the Meerschaum instance of the pipes. 71 Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or 72 `meerschaum.connectors.api.APIConnector.APIConnector`. 73 74 as_list: bool, default False 75 If `True`, return pipes in a list instead of a hierarchical dictionary. 76 `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}` 77 `True` : `[Pipe]` 78 79 as_tags_dict: bool, default False 80 If `True`, return a dictionary mapping tags to pipes. 81 Pipes with multiple tags will be repeated. 82 83 method: str, default 'registered' 84 Available options: `['registered', 'explicit', 'all']` 85 If `'registered'` (default), create pipes based on registered keys in the connector's pipes table 86 (API or SQL connector, depends on mrsm_instance). 87 If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys 88 instead of consulting the pipes table. Useful for creating non-existent pipes. 89 If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`. 90 **NOTE:** Method `'all'` is not implemented! 91 92 workers: Optional[int], default None 93 If provided (and `as_tags_dict` is `True`), set the number of workers for the pool 94 to fetch tags. 95 Only takes effect if the instance connector supports multi-threading 96 97 **kw: Any: 98 Keyword arguments to pass to the `meerschaum.Pipe` constructor. 99 100 Returns 101 ------- 102 A dictionary of dictionaries and `meerschaum.Pipe` objects 103 in the connector, metric, location hierarchy. 104 If `as_list` is `True`, return a list of `meerschaum.Pipe` objects. 105 If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes. 106 107 Examples 108 -------- 109 ``` 110 >>> ### Manual definition: 111 >>> pipes = { 112 ... <connector_keys>: { 113 ... <metric_key>: { 114 ... <location_key>: Pipe( 115 ... <connector_keys>, 116 ... <metric_key>, 117 ... <location_key>, 118 ... ), 119 ... }, 120 ... }, 121 ... }, 122 >>> ### Accessing a single pipe: 123 >>> pipes['sql:main']['weather'][None] 124 >>> ### Return a list instead: 125 >>> get_pipes(as_list=True) 126 [Pipe('sql:main', 'weather')] 127 >>> get_pipes(as_tags_dict=True) 128 {'gvl': Pipe('sql:main', 'weather')} 129 ``` 130 """ 131 132 import json 133 from collections import defaultdict 134 from meerschaum.config import get_config 135 from meerschaum.utils.warnings import error 136 from meerschaum.utils.misc import filter_keywords 137 from meerschaum.utils.pool import get_pool 138 139 if connector_keys is None: 140 connector_keys = [] 141 if metric_keys is None: 142 metric_keys = [] 143 if location_keys is None: 144 location_keys = [] 145 if params is None: 146 params = {} 147 if tags is None: 148 tags = [] 149 150 if isinstance(connector_keys, str): 151 connector_keys = [connector_keys] 152 if isinstance(metric_keys, str): 153 metric_keys = [metric_keys] 154 if isinstance(location_keys, str): 155 location_keys = [location_keys] 156 157 ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`). 158 if mrsm_instance is None: 159 mrsm_instance = instance 160 if mrsm_instance is None: 161 mrsm_instance = get_config('meerschaum', 'instance', patch=True) 162 if isinstance(mrsm_instance, str): 163 from meerschaum.connectors.parse import parse_instance_keys 164 connector = parse_instance_keys(keys=mrsm_instance, debug=debug) 165 else: 166 from meerschaum.connectors import instance_types 167 valid_connector = False 168 if hasattr(mrsm_instance, 'type'): 169 if mrsm_instance.type in instance_types: 170 valid_connector = True 171 if not valid_connector: 172 error(f"Invalid instance connector: {mrsm_instance}") 173 connector = mrsm_instance 174 if debug: 175 from meerschaum.utils.debug import dprint 176 dprint(f"Using instance connector: {connector}") 177 if not connector: 178 error(f"Could not create connector from keys: '{mrsm_instance}'") 179 180 ### Get a list of tuples for the keys needed to build pipes. 181 result = fetch_pipes_keys( 182 method, 183 connector, 184 connector_keys = connector_keys, 185 metric_keys = metric_keys, 186 location_keys = location_keys, 187 tags = tags, 188 params = params, 189 workers = workers, 190 debug = debug 191 ) 192 if result is None: 193 error("Unable to build pipes!") 194 195 ### Populate the `pipes` dictionary with Pipes based on the keys 196 ### obtained from the chosen `method`. 197 from meerschaum import Pipe 198 pipes = {} 199 for keys_tuple in result: 200 ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2] 201 pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None 202 pipe_parameters = ( 203 pipe_tags_or_parameters 204 if isinstance(pipe_tags_or_parameters, (dict, str)) 205 else None 206 ) 207 if isinstance(pipe_parameters, str): 208 pipe_parameters = json.loads(pipe_parameters) 209 pipe_tags = ( 210 pipe_tags_or_parameters 211 if isinstance(pipe_tags_or_parameters, list) 212 else ( 213 pipe_tags_or_parameters.get('tags', []) 214 if isinstance(pipe_tags_or_parameters, dict) 215 else None 216 ) 217 ) 218 219 if ck not in pipes: 220 pipes[ck] = {} 221 222 if mk not in pipes[ck]: 223 pipes[ck][mk] = {} 224 225 pipe = Pipe( 226 ck, mk, lk, 227 mrsm_instance = connector, 228 parameters = pipe_parameters, 229 tags = pipe_tags, 230 debug = debug, 231 **filter_keywords(Pipe, **kw) 232 ) 233 pipe.__dict__['_tags'] = pipe_tags 234 pipes[ck][mk][lk] = pipe 235 236 if not as_list and not as_tags_dict: 237 return pipes 238 239 from meerschaum.utils.misc import flatten_pipes_dict 240 pipes_list = flatten_pipes_dict(pipes) 241 if as_list: 242 return pipes_list 243 244 pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1)) 245 def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]: 246 _tags = pipe.__dict__.get('_tags', None) 247 gathered_tags = _tags if _tags is not None else pipe.tags 248 return pipe, (gathered_tags or []) 249 250 tags_pipes = defaultdict(lambda: []) 251 pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list)) 252 for pipe, tags in pipes_tags.items(): 253 for tag in (tags or []): 254 tags_pipes[tag].append(pipe) 255 256 return dict(tags_pipes)
Return a dictionary or list of meerschaum.Pipe
objects.
Parameters
- connector_keys (Union[str, List[str], None], default None):
String or list of connector keys.
If omitted or is
'*'
, fetch all possible keys. If a string begins with'_'
, select keys that do NOT match the string. - metric_keys (Union[str, List[str], None], default None):
String or list of metric keys. See
connector_keys
for formatting. - location_keys (Union[str, List[str], None], default None):
String or list of location keys. See
connector_keys
for formatting. - tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
- params (Optional[Dict[str, Any]], default None):
Dictionary of additional parameters to search by.
Params are parsed into a SQL WHERE clause.
E.g.
{'a': 1, 'b': 2}
equates to'WHERE a = 1 AND b = 2'
- mrsm_instance (Union[str, InstanceConnector, None], default None):
Connector keys for the Meerschaum instance of the pipes.
Must be a
meerschaum.connectors.sql.SQLConnector.SQLConnector
ormeerschaum.connectors.api.APIConnector.APIConnector
. - as_list (bool, default False):
If
True
, return pipes in a list instead of a hierarchical dictionary.False
:{connector_keys: {metric_key: {location_key: Pipe}}}
True
:[Pipe]
- as_tags_dict (bool, default False):
If
True
, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated. - method (str, default 'registered'):
Available options:
['registered', 'explicit', 'all']
If'registered'
(default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If'explicit'
, create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If'all'
, create pipes from predefined metrics and locations. Requiredconnector_keys
. NOTE: Method'all'
is not implemented! - workers (Optional[int], default None):
If provided (and
as_tags_dict
isTrue
), set the number of workers for the pool to fetch tags. Only takes effect if the instance connector supports multi-threading - **kw (Any:):
Keyword arguments to pass to the
meerschaum.Pipe
constructor.
Returns
- A dictionary of dictionaries and
meerschaum.Pipe
objects - in the connector, metric, location hierarchy.
- If
as_list
isTrue
, return a list ofmeerschaum.Pipe
objects. - If
as_tags_dict
isTrue
, return a dictionary mapping tags to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
... <connector_keys>: {
... <metric_key>: {
... <location_key>: Pipe(
... <connector_keys>,
... <metric_key>,
... <location_key>,
... ),
... },
... },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
68def get_connector( 69 type: str = None, 70 label: str = None, 71 refresh: bool = False, 72 debug: bool = False, 73 _load_plugins: bool = True, 74 **kw: Any 75) -> Connector: 76 """ 77 Return existing connector or create new connection and store for reuse. 78 79 You can create new connectors if enough parameters are provided for the given type and flavor. 80 81 Parameters 82 ---------- 83 type: Optional[str], default None 84 Connector type (sql, api, etc.). 85 Defaults to the type of the configured `instance_connector`. 86 87 label: Optional[str], default None 88 Connector label (e.g. main). Defaults to `'main'`. 89 90 refresh: bool, default False 91 Refresh the Connector instance / construct new object. Defaults to `False`. 92 93 kw: Any 94 Other arguments to pass to the Connector constructor. 95 If the Connector has already been constructed and new arguments are provided, 96 `refresh` is set to `True` and the old Connector is replaced. 97 98 Returns 99 ------- 100 A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`, 101 `meerschaum.connectors.sql.SQLConnector`). 102 103 Examples 104 -------- 105 The following parameters would create a new 106 `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file. 107 108 ``` 109 >>> conn = get_connector( 110 ... type = 'sql', 111 ... label = 'newlabel', 112 ... flavor = 'sqlite', 113 ... database = '/file/path/to/database.db' 114 ... ) 115 >>> 116 ``` 117 118 """ 119 from meerschaum.connectors.parse import parse_instance_keys 120 from meerschaum.config import get_config 121 from meerschaum._internal.static import STATIC_CONFIG 122 from meerschaum.utils.warnings import warn 123 global _loaded_plugin_connectors 124 if isinstance(type, str) and not label and ':' in type: 125 type, label = type.split(':', maxsplit=1) 126 127 if _load_plugins: 128 with _locks['_loaded_plugin_connectors']: 129 if not _loaded_plugin_connectors: 130 load_plugin_connectors() 131 _load_builtin_custom_connectors() 132 _loaded_plugin_connectors = True 133 134 if type is None and label is None: 135 default_instance_keys = get_config('meerschaum', 'instance', patch=True) 136 ### recursive call to get_connector 137 return parse_instance_keys(default_instance_keys) 138 139 ### NOTE: the default instance connector may not be main. 140 ### Only fall back to 'main' if the type is provided by the label is omitted. 141 label = label if label is not None else STATIC_CONFIG['connectors']['default_label'] 142 143 ### type might actually be a label. Check if so and raise a warning. 144 if type not in connectors: 145 possibilities, poss_msg = [], "" 146 for _type in get_config('meerschaum', 'connectors'): 147 if type in get_config('meerschaum', 'connectors', _type): 148 possibilities.append(f"{_type}:{type}") 149 if len(possibilities) > 0: 150 poss_msg = " Did you mean" 151 for poss in possibilities[:-1]: 152 poss_msg += f" '{poss}'," 153 if poss_msg.endswith(','): 154 poss_msg = poss_msg[:-1] 155 if len(possibilities) > 1: 156 poss_msg += " or" 157 poss_msg += f" '{possibilities[-1]}'?" 158 159 warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False) 160 return None 161 162 if 'sql' not in types: 163 from meerschaum.connectors.plugin import PluginConnector 164 from meerschaum.connectors.valkey import ValkeyConnector 165 with _locks['types']: 166 types.update({ 167 'api': APIConnector, 168 'sql': SQLConnector, 169 'plugin': PluginConnector, 170 'valkey': ValkeyConnector, 171 }) 172 173 ### determine if we need to call the constructor 174 if not refresh: 175 ### see if any user-supplied arguments differ from the existing instance 176 if label in connectors[type]: 177 warning_message = None 178 for attribute, value in kw.items(): 179 if attribute not in connectors[type][label].meta: 180 import inspect 181 cls = connectors[type][label].__class__ 182 cls_init_signature = inspect.signature(cls) 183 cls_init_params = cls_init_signature.parameters 184 if attribute not in cls_init_params: 185 warning_message = ( 186 f"Received new attribute '{attribute}' not present in connector " + 187 f"{connectors[type][label]}.\n" 188 ) 189 elif connectors[type][label].__dict__[attribute] != value: 190 warning_message = ( 191 f"Mismatched values for attribute '{attribute}' in connector " 192 + f"'{connectors[type][label]}'.\n" + 193 f" - Keyword value: '{value}'\n" + 194 f" - Existing value: '{connectors[type][label].__dict__[attribute]}'\n" 195 ) 196 if warning_message is not None: 197 warning_message += ( 198 "\nSetting `refresh` to True and recreating connector with type:" 199 + f" '{type}' and label '{label}'." 200 ) 201 refresh = True 202 warn(warning_message) 203 else: ### connector doesn't yet exist 204 refresh = True 205 206 ### only create an object if refresh is True 207 ### (can be manually specified, otherwise determined above) 208 if refresh: 209 with _locks['connectors']: 210 try: 211 ### will raise an error if configuration is incorrect / missing 212 conn = types[type](label=label, **kw) 213 connectors[type][label] = conn 214 except InvalidAttributesError as ie: 215 warn( 216 f"Incorrect attributes for connector '{type}:{label}'.\n" 217 + str(ie), 218 stack = False, 219 ) 220 conn = None 221 except Exception as e: 222 from meerschaum.utils.formatting import get_console 223 console = get_console() 224 if console: 225 console.print_exception() 226 warn( 227 f"Exception when creating connector '{type}:{label}'.\n" + str(e), 228 stack = False, 229 ) 230 conn = None 231 if conn is None: 232 return None 233 234 return connectors[type][label]
Return existing connector or create new connection and store for reuse.
You can create new connectors if enough parameters are provided for the given type and flavor.
Parameters
- type (Optional[str], default None):
Connector type (sql, api, etc.).
Defaults to the type of the configured
instance_connector
. - label (Optional[str], default None):
Connector label (e.g. main). Defaults to
'main'
. - refresh (bool, default False):
Refresh the Connector instance / construct new object. Defaults to
False
. - kw (Any):
Other arguments to pass to the Connector constructor.
If the Connector has already been constructed and new arguments are provided,
refresh
is set toTrue
and the old Connector is replaced.
Returns
- A new Meerschaum connector (e.g.
meerschaum.connectors.api.APIConnector
, meerschaum.connectors.sql.SQLConnector
).
Examples
The following parameters would create a new
meerschaum.connectors.sql.SQLConnector
that isn't in the configuration file.
>>> conn = get_connector(
... type = 'sql',
... label = 'newlabel',
... flavor = 'sqlite',
... database = '/file/path/to/database.db'
... )
>>>
115def get_config( 116 *keys: str, 117 patch: bool = True, 118 substitute: bool = True, 119 sync_files: bool = True, 120 write_missing: bool = True, 121 as_tuple: bool = False, 122 warn: bool = True, 123 debug: bool = False 124) -> Any: 125 """ 126 Return the Meerschaum configuration dictionary. 127 If positional arguments are provided, index by the keys. 128 Raises a warning if invalid keys are provided. 129 130 Parameters 131 ---------- 132 keys: str: 133 List of strings to index. 134 135 patch: bool, default True 136 If `True`, patch missing default keys into the config directory. 137 Defaults to `True`. 138 139 sync_files: bool, default True 140 If `True`, sync files if needed. 141 Defaults to `True`. 142 143 write_missing: bool, default True 144 If `True`, write default values when the main config files are missing. 145 Defaults to `True`. 146 147 substitute: bool, default True 148 If `True`, subsitute 'MRSM{}' values. 149 Defaults to `True`. 150 151 as_tuple: bool, default False 152 If `True`, return a tuple of type (success, value). 153 Defaults to `False`. 154 155 Returns 156 ------- 157 The value in the configuration directory, indexed by the provided keys. 158 159 Examples 160 -------- 161 >>> get_config('meerschaum', 'instance') 162 'sql:main' 163 >>> get_config('does', 'not', 'exist') 164 UserWarning: Invalid keys in config: ('does', 'not', 'exist') 165 """ 166 import json 167 168 symlinks_key = STATIC_CONFIG['config']['symlinks_key'] 169 if debug: 170 from meerschaum.utils.debug import dprint 171 dprint(f"Indexing keys: {keys}", color=False) 172 173 if len(keys) == 0: 174 _rc = _config( 175 substitute=substitute, 176 sync_files=sync_files, 177 write_missing=(write_missing and _allow_write_missing), 178 ) 179 if as_tuple: 180 return True, _rc 181 return _rc 182 183 ### Weird threading issues, only import if substitute is True. 184 if substitute: 185 from meerschaum.config._read_config import search_and_substitute_config 186 ### Invalidate the cache if it was read before with substitute=False 187 ### but there still exist substitutions. 188 if ( 189 config is not None and substitute and keys[0] != symlinks_key 190 and 'MRSM{' in json.dumps(config.get(keys[0])) 191 ): 192 try: 193 _subbed = search_and_substitute_config({keys[0]: config[keys[0]]}) 194 except Exception: 195 import traceback 196 traceback.print_exc() 197 _subbed = {keys[0]: config[keys[0]]} 198 199 config[keys[0]] = _subbed[keys[0]] 200 if symlinks_key in _subbed: 201 if symlinks_key not in config: 202 config[symlinks_key] = {} 203 config[symlinks_key] = apply_patch_to_config( 204 _subbed.get(symlinks_key, {}), 205 config.get(symlinks_key, {}), 206 ) 207 208 from meerschaum.config._sync import sync_files as _sync_files 209 if config is None: 210 _config(*keys, sync_files=sync_files) 211 212 invalid_keys = False 213 if keys[0] not in config and keys[0] != symlinks_key: 214 single_key_config = read_config( 215 keys=[keys[0]], substitute=substitute, write_missing=write_missing 216 ) 217 if keys[0] not in single_key_config: 218 invalid_keys = True 219 else: 220 config[keys[0]] = single_key_config.get(keys[0], None) 221 if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]: 222 if symlinks_key not in config: 223 config[symlinks_key] = {} 224 config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]] 225 226 if sync_files: 227 _sync_files(keys=[keys[0]]) 228 229 c = config 230 if len(keys) > 0: 231 for k in keys: 232 try: 233 c = c[k] 234 except Exception: 235 invalid_keys = True 236 break 237 if invalid_keys: 238 ### Check if the keys are in the default configuration. 239 from meerschaum.config._default import default_config 240 in_default = True 241 patched_default_config = ( 242 search_and_substitute_config(default_config) 243 if substitute else copy.deepcopy(default_config) 244 ) 245 _c = patched_default_config 246 for k in keys: 247 try: 248 _c = _c[k] 249 except Exception: 250 in_default = False 251 if in_default: 252 c = _c 253 invalid_keys = False 254 warning_msg = f"Invalid keys in config: {keys}" 255 if not in_default: 256 try: 257 if warn: 258 from meerschaum.utils.warnings import warn as _warn 259 _warn(warning_msg, stacklevel=3, color=False) 260 except Exception: 261 if warn: 262 print(warning_msg) 263 if as_tuple: 264 return False, None 265 return None 266 267 ### Don't write keys that we haven't yet loaded into memory. 268 not_loaded_keys = [k for k in patched_default_config if k not in config] 269 for k in not_loaded_keys: 270 patched_default_config.pop(k, None) 271 272 set_config( 273 apply_patch_to_config( 274 patched_default_config, 275 config, 276 ) 277 ) 278 if patch and keys[0] != symlinks_key: 279 if write_missing: 280 write_config(config, debug=debug) 281 282 if as_tuple: 283 return (not invalid_keys), c 284 return c
Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.
Parameters
- keys (str:): List of strings to index.
- patch (bool, default True):
If
True
, patch missing default keys into the config directory. Defaults toTrue
. - sync_files (bool, default True):
If
True
, sync files if needed. Defaults toTrue
. - write_missing (bool, default True):
If
True
, write default values when the main config files are missing. Defaults toTrue
. - substitute (bool, default True):
If
True
, subsitute 'MRSM{}' values. Defaults toTrue
. - as_tuple (bool, default False):
If
True
, return a tuple of type (success, value). Defaults toFalse
.
Returns
- The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
64class Pipe: 65 """ 66 Access Meerschaum pipes via Pipe objects. 67 68 Pipes are identified by the following: 69 70 1. Connector keys (e.g. `'sql:main'`) 71 2. Metric key (e.g. `'weather'`) 72 3. Location (optional; e.g. `None`) 73 74 A pipe's connector keys correspond to a data source, and when the pipe is synced, 75 its `fetch` definition is evaluated and executed to produce new data. 76 77 Alternatively, new data may be directly synced via `pipe.sync()`: 78 79 ``` 80 >>> from meerschaum import Pipe 81 >>> pipe = Pipe('csv', 'weather') 82 >>> 83 >>> import pandas as pd 84 >>> df = pd.read_csv('weather.csv') 85 >>> pipe.sync(df) 86 ``` 87 """ 88 89 from ._fetch import ( 90 fetch, 91 get_backtrack_interval, 92 ) 93 from ._data import ( 94 get_data, 95 get_backtrack_data, 96 get_rowcount, 97 get_data, 98 get_doc, 99 get_value, 100 _get_data_as_iterator, 101 get_chunk_interval, 102 get_chunk_bounds, 103 get_chunk_bounds_batches, 104 parse_date_bounds, 105 ) 106 from ._register import register 107 from ._attributes import ( 108 attributes, 109 parameters, 110 columns, 111 indices, 112 indexes, 113 dtypes, 114 autoincrement, 115 autotime, 116 upsert, 117 static, 118 tzinfo, 119 enforce, 120 null_indices, 121 mixed_numerics, 122 get_columns, 123 get_columns_types, 124 get_columns_indices, 125 get_indices, 126 get_parameters, 127 get_dtypes, 128 update_parameters, 129 tags, 130 get_id, 131 id, 132 get_val_column, 133 parents, 134 parent, 135 children, 136 target, 137 _target_legacy, 138 guess_datetime, 139 precision, 140 get_precision, 141 ) 142 from ._cache import ( 143 _get_cache_connector, 144 _cache_value, 145 _get_cached_value, 146 _invalidate_cache, 147 _get_cache_dir_path, 148 _write_cache_key, 149 _write_cache_file, 150 _write_cache_conn_key, 151 _read_cache_key, 152 _read_cache_file, 153 _read_cache_conn_key, 154 _load_cache_keys, 155 _load_cache_files, 156 _load_cache_conn_keys, 157 _get_cache_keys, 158 _get_cache_file_keys, 159 _get_cache_conn_keys, 160 _clear_cache_key, 161 _clear_cache_file, 162 _clear_cache_conn_key, 163 ) 164 from ._show import show 165 from ._edit import edit, edit_definition, update 166 from ._sync import ( 167 sync, 168 get_sync_time, 169 exists, 170 filter_existing, 171 _get_chunk_label, 172 get_num_workers, 173 _persist_new_special_columns, 174 ) 175 from ._verify import ( 176 verify, 177 get_bound_interval, 178 get_bound_time, 179 ) 180 from ._delete import delete 181 from ._drop import drop, drop_indices 182 from ._index import create_indices 183 from ._clear import clear 184 from ._deduplicate import deduplicate 185 from ._bootstrap import bootstrap 186 from ._dtypes import enforce_dtypes, infer_dtypes 187 from ._copy import copy_to 188 189 def __init__( 190 self, 191 connector: str = '', 192 metric: str = '', 193 location: Optional[str] = None, 194 parameters: Optional[Dict[str, Any]] = None, 195 columns: Union[Dict[str, str], List[str], None] = None, 196 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 197 tags: Optional[List[str]] = None, 198 target: Optional[str] = None, 199 dtypes: Optional[Dict[str, str]] = None, 200 instance: Optional[Union[str, InstanceConnector]] = None, 201 upsert: Optional[bool] = None, 202 autoincrement: Optional[bool] = None, 203 autotime: Optional[bool] = None, 204 precision: Union[str, Dict[str, Union[str, int]], None] = None, 205 static: Optional[bool] = None, 206 enforce: Optional[bool] = None, 207 null_indices: Optional[bool] = None, 208 mixed_numerics: Optional[bool] = None, 209 temporary: bool = False, 210 cache: Optional[bool] = None, 211 cache_connector_keys: Optional[str] = None, 212 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 213 connector_keys: Optional[str] = None, 214 metric_key: Optional[str] = None, 215 location_key: Optional[str] = None, 216 instance_keys: Optional[str] = None, 217 indexes: Union[Dict[str, str], List[str], None] = None, 218 debug: bool = False, 219 ): 220 """ 221 Parameters 222 ---------- 223 connector: str 224 Keys for the pipe's source connector, e.g. `'sql:main'`. 225 226 metric: str 227 Label for the pipe's contents, e.g. `'weather'`. 228 229 location: str, default None 230 Label for the pipe's location. Defaults to `None`. 231 232 parameters: Optional[Dict[str, Any]], default None 233 Optionally set a pipe's parameters from the constructor, 234 e.g. columns and other attributes. 235 You can edit these parameters with `edit pipes`. 236 237 columns: Union[Dict[str, str], List[str], None], default None 238 Set the `columns` dictionary of `parameters`. 239 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 240 241 indices: Optional[Dict[str, Union[str, List[str]]]], default None 242 Set the `indices` dictionary of `parameters`. 243 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 244 245 tags: Optional[List[str]], default None 246 A list of strings to be added under the `'tags'` key of `parameters`. 247 You can select pipes with certain tags using `--tags`. 248 249 dtypes: Optional[Dict[str, str]], default None 250 Set the `dtypes` dictionary of `parameters`. 251 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 252 253 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 254 Connector for the Meerschaum instance where the pipe resides. 255 Defaults to the preconfigured default instance (`'sql:main'`). 256 257 instance: Optional[Union[str, InstanceConnector]], default None 258 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 259 260 upsert: Optional[bool], default None 261 If `True`, set `upsert` to `True` in the parameters. 262 263 autoincrement: Optional[bool], default None 264 If `True`, set `autoincrement` in the parameters. 265 266 autotime: Optional[bool], default None 267 If `True`, set `autotime` in the parameters. 268 269 precision: Union[str, Dict[str, Union[str, int]], None], default None 270 If provided, set `precision` in the parameters. 271 This may be either a string (the precision unit) or a dictionary of in the form 272 `{'unit': <unit>, 'interval': <interval>}`. 273 Default is determined by the `datetime` column dtype 274 (e.g. `datetime64[us]` is `microsecond` precision). 275 276 static: Optional[bool], default None 277 If `True`, set `static` in the parameters. 278 279 enforce: Optional[bool], default None 280 If `False`, skip data type enforcement. 281 Default behavior is `True`. 282 283 null_indices: Optional[bool], default None 284 Set to `False` if there will be no null values in the index columns. 285 Defaults to `True`. 286 287 mixed_numerics: bool, default None 288 If `True`, integer columns will be converted to `numeric` when floats are synced. 289 Set to `False` to disable this behavior. 290 Defaults to `True`. 291 292 temporary: bool, default False 293 If `True`, prevent instance tables (pipes, users, plugins) from being created. 294 295 cache: Optional[bool], default None 296 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 297 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 298 Defaults to `True` (from `None`). 299 300 cache_connector_keys: Optional[str], default None 301 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 302 """ 303 from meerschaum.utils.warnings import error, warn 304 if (not connector and not connector_keys) or (not metric and not metric_key): 305 error( 306 "Please provide strings for the connector and metric\n " 307 + "(first two positional arguments)." 308 ) 309 310 ### Fall back to legacy `location_key` just in case. 311 if not location: 312 location = location_key 313 314 if not connector: 315 connector = connector_keys 316 317 if not metric: 318 metric = metric_key 319 320 if location in ('[None]', 'None'): 321 location = None 322 323 from meerschaum._internal.static import STATIC_CONFIG 324 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 325 for k in (connector, metric, location, *(tags or [])): 326 if str(k).startswith(negation_prefix): 327 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 328 329 self.connector_keys = str(connector) 330 self.connector_key = self.connector_keys ### Alias 331 self.metric_key = metric 332 self.location_key = location 333 self.temporary = temporary 334 self.cache = cache if cache is not None else (not temporary) 335 self.cache_connector_keys = ( 336 str(cache_connector_keys) 337 if cache_connector_keys is not None 338 else None 339 ) 340 self.debug = debug 341 342 self._attributes: Dict[str, Any] = { 343 'connector_keys': self.connector_keys, 344 'metric_key': self.metric_key, 345 'location_key': self.location_key, 346 'parameters': {}, 347 } 348 349 ### only set parameters if values are provided 350 if isinstance(parameters, dict): 351 self._attributes['parameters'] = parameters 352 else: 353 if parameters is not None: 354 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 355 self._attributes['parameters'] = {} 356 357 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 358 if isinstance(columns, (list, tuple)): 359 columns = {str(col): str(col) for col in columns} 360 if isinstance(columns, dict): 361 self._attributes['parameters']['columns'] = columns 362 elif isinstance(columns, str) and 'Pipe(' in columns: 363 pass 364 elif columns is not None: 365 warn(f"The provided columns are of invalid type '{type(columns)}'.") 366 367 indices = ( 368 indices 369 or indexes 370 or self._attributes.get('parameters', {}).get('indices', None) 371 or self._attributes.get('parameters', {}).get('indexes', None) 372 ) 373 if isinstance(indices, dict): 374 indices_key = ( 375 'indexes' 376 if 'indexes' in self._attributes['parameters'] 377 else 'indices' 378 ) 379 self._attributes['parameters'][indices_key] = indices 380 381 if isinstance(tags, (list, tuple)): 382 self._attributes['parameters']['tags'] = tags 383 elif tags is not None: 384 warn(f"The provided tags are of invalid type '{type(tags)}'.") 385 386 if isinstance(target, str): 387 self._attributes['parameters']['target'] = target 388 elif target is not None: 389 warn(f"The provided target is of invalid type '{type(target)}'.") 390 391 if isinstance(dtypes, dict): 392 self._attributes['parameters']['dtypes'] = dtypes 393 elif dtypes is not None: 394 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 395 396 if isinstance(upsert, bool): 397 self._attributes['parameters']['upsert'] = upsert 398 399 if isinstance(autoincrement, bool): 400 self._attributes['parameters']['autoincrement'] = autoincrement 401 402 if isinstance(autotime, bool): 403 self._attributes['parameters']['autotime'] = autotime 404 405 if isinstance(precision, dict): 406 self._attributes['parameters']['precision'] = precision 407 elif isinstance(precision, str): 408 self._attributes['parameters']['precision'] = {'unit': precision} 409 410 if isinstance(static, bool): 411 self._attributes['parameters']['static'] = static 412 self._static = static 413 414 if isinstance(enforce, bool): 415 self._attributes['parameters']['enforce'] = enforce 416 417 if isinstance(null_indices, bool): 418 self._attributes['parameters']['null_indices'] = null_indices 419 420 if isinstance(mixed_numerics, bool): 421 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 422 423 ### NOTE: The parameters dictionary is {} by default. 424 ### A Pipe may be registered without parameters, then edited, 425 ### or a Pipe may be registered with parameters set in-memory first. 426 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 427 if _mrsm_instance is None: 428 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 429 430 if not isinstance(_mrsm_instance, str): 431 self._instance_connector = _mrsm_instance 432 self.instance_keys = str(_mrsm_instance) 433 else: 434 self.instance_keys = _mrsm_instance 435 436 if self.instance_keys == 'sql:memory': 437 self.cache = False 438 439 440 @property 441 def meta(self): 442 """ 443 Return the four keys needed to reconstruct this pipe. 444 """ 445 return { 446 'connector_keys': self.connector_keys, 447 'metric_key': self.metric_key, 448 'location_key': self.location_key, 449 'instance_keys': self.instance_keys, 450 } 451 452 def keys(self) -> List[str]: 453 """ 454 Return the ordered keys for this pipe. 455 """ 456 return { 457 key: val 458 for key, val in self.meta.items() 459 if key != 'instance' 460 } 461 462 @property 463 def instance_connector(self) -> Union[InstanceConnector, None]: 464 """ 465 The instance connector on which this pipe resides. 466 """ 467 if '_instance_connector' not in self.__dict__: 468 from meerschaum.connectors.parse import parse_instance_keys 469 conn = parse_instance_keys(self.instance_keys) 470 if conn: 471 self._instance_connector = conn 472 else: 473 return None 474 return self._instance_connector 475 476 @property 477 def connector(self) -> Union['Connector', None]: 478 """ 479 The connector to the data source. 480 """ 481 if '_connector' not in self.__dict__: 482 from meerschaum.connectors.parse import parse_instance_keys 483 import warnings 484 with warnings.catch_warnings(): 485 warnings.simplefilter('ignore') 486 try: 487 conn = parse_instance_keys(self.connector_keys) 488 except Exception: 489 conn = None 490 if conn: 491 self._connector = conn 492 else: 493 return None 494 return self._connector 495 496 def __str__(self, ansi: bool=False): 497 return pipe_repr(self, ansi=ansi) 498 499 def __eq__(self, other): 500 try: 501 return ( 502 isinstance(self, type(other)) 503 and self.connector_keys == other.connector_keys 504 and self.metric_key == other.metric_key 505 and self.location_key == other.location_key 506 and self.instance_keys == other.instance_keys 507 ) 508 except Exception: 509 return False 510 511 def __hash__(self): 512 ### Using an esoteric separator to avoid collisions. 513 sep = "[\"']" 514 return hash( 515 str(self.connector_keys) + sep 516 + str(self.metric_key) + sep 517 + str(self.location_key) + sep 518 + str(self.instance_keys) + sep 519 ) 520 521 def __repr__(self, ansi: bool=True, **kw) -> str: 522 if not hasattr(sys, 'ps1'): 523 ansi = False 524 525 return pipe_repr(self, ansi=ansi, **kw) 526 527 def __pt_repr__(self): 528 from meerschaum.utils.packages import attempt_import 529 prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False) 530 return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True)) 531 532 def __getstate__(self) -> Dict[str, Any]: 533 """ 534 Define the state dictionary (pickling). 535 """ 536 return { 537 'connector_keys': self.connector_keys, 538 'metric_key': self.metric_key, 539 'location_key': self.location_key, 540 'parameters': self._attributes.get('parameters', None), 541 'instance_keys': self.instance_keys, 542 } 543 544 def __setstate__(self, _state: Dict[str, Any]): 545 """ 546 Read the state (unpickling). 547 """ 548 self.__init__(**_state) 549 550 def __getitem__(self, key: str) -> Any: 551 """ 552 Index the pipe's attributes. 553 If the `key` cannot be found`, return `None`. 554 """ 555 if key in self.attributes: 556 return self.attributes.get(key, None) 557 558 aliases = { 559 'connector': 'connector_keys', 560 'connector_key': 'connector_keys', 561 'metric': 'metric_key', 562 'location': 'location_key', 563 } 564 aliased_key = aliases.get(key, None) 565 if aliased_key is not None: 566 return self.attributes.get(aliased_key, None) 567 568 property_aliases = { 569 'instance': 'instance_keys', 570 'instance_key': 'instance_keys', 571 } 572 aliased_key = property_aliases.get(key, None) 573 if aliased_key is not None: 574 key = aliased_key 575 return getattr(self, key, None) 576 577 def __copy__(self): 578 """ 579 Return a shallow copy of the current pipe. 580 """ 581 return mrsm.Pipe( 582 self.connector_keys, self.metric_key, self.location_key, 583 instance=self.instance_keys, 584 parameters=self._attributes.get('parameters', None), 585 ) 586 587 def __deepcopy__(self, memo): 588 """ 589 Return a deep copy of the current pipe. 590 """ 591 return self.__copy__()
Access Meerschaum pipes via Pipe objects.
Pipes are identified by the following:
- Connector keys (e.g.
'sql:main'
) - Metric key (e.g.
'weather'
) - Location (optional; e.g.
None
)
A pipe's connector keys correspond to a data source, and when the pipe is synced,
its fetch
definition is evaluated and executed to produce new data.
Alternatively, new data may be directly synced via pipe.sync()
:
>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
189 def __init__( 190 self, 191 connector: str = '', 192 metric: str = '', 193 location: Optional[str] = None, 194 parameters: Optional[Dict[str, Any]] = None, 195 columns: Union[Dict[str, str], List[str], None] = None, 196 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 197 tags: Optional[List[str]] = None, 198 target: Optional[str] = None, 199 dtypes: Optional[Dict[str, str]] = None, 200 instance: Optional[Union[str, InstanceConnector]] = None, 201 upsert: Optional[bool] = None, 202 autoincrement: Optional[bool] = None, 203 autotime: Optional[bool] = None, 204 precision: Union[str, Dict[str, Union[str, int]], None] = None, 205 static: Optional[bool] = None, 206 enforce: Optional[bool] = None, 207 null_indices: Optional[bool] = None, 208 mixed_numerics: Optional[bool] = None, 209 temporary: bool = False, 210 cache: Optional[bool] = None, 211 cache_connector_keys: Optional[str] = None, 212 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 213 connector_keys: Optional[str] = None, 214 metric_key: Optional[str] = None, 215 location_key: Optional[str] = None, 216 instance_keys: Optional[str] = None, 217 indexes: Union[Dict[str, str], List[str], None] = None, 218 debug: bool = False, 219 ): 220 """ 221 Parameters 222 ---------- 223 connector: str 224 Keys for the pipe's source connector, e.g. `'sql:main'`. 225 226 metric: str 227 Label for the pipe's contents, e.g. `'weather'`. 228 229 location: str, default None 230 Label for the pipe's location. Defaults to `None`. 231 232 parameters: Optional[Dict[str, Any]], default None 233 Optionally set a pipe's parameters from the constructor, 234 e.g. columns and other attributes. 235 You can edit these parameters with `edit pipes`. 236 237 columns: Union[Dict[str, str], List[str], None], default None 238 Set the `columns` dictionary of `parameters`. 239 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 240 241 indices: Optional[Dict[str, Union[str, List[str]]]], default None 242 Set the `indices` dictionary of `parameters`. 243 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 244 245 tags: Optional[List[str]], default None 246 A list of strings to be added under the `'tags'` key of `parameters`. 247 You can select pipes with certain tags using `--tags`. 248 249 dtypes: Optional[Dict[str, str]], default None 250 Set the `dtypes` dictionary of `parameters`. 251 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 252 253 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 254 Connector for the Meerschaum instance where the pipe resides. 255 Defaults to the preconfigured default instance (`'sql:main'`). 256 257 instance: Optional[Union[str, InstanceConnector]], default None 258 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 259 260 upsert: Optional[bool], default None 261 If `True`, set `upsert` to `True` in the parameters. 262 263 autoincrement: Optional[bool], default None 264 If `True`, set `autoincrement` in the parameters. 265 266 autotime: Optional[bool], default None 267 If `True`, set `autotime` in the parameters. 268 269 precision: Union[str, Dict[str, Union[str, int]], None], default None 270 If provided, set `precision` in the parameters. 271 This may be either a string (the precision unit) or a dictionary of in the form 272 `{'unit': <unit>, 'interval': <interval>}`. 273 Default is determined by the `datetime` column dtype 274 (e.g. `datetime64[us]` is `microsecond` precision). 275 276 static: Optional[bool], default None 277 If `True`, set `static` in the parameters. 278 279 enforce: Optional[bool], default None 280 If `False`, skip data type enforcement. 281 Default behavior is `True`. 282 283 null_indices: Optional[bool], default None 284 Set to `False` if there will be no null values in the index columns. 285 Defaults to `True`. 286 287 mixed_numerics: bool, default None 288 If `True`, integer columns will be converted to `numeric` when floats are synced. 289 Set to `False` to disable this behavior. 290 Defaults to `True`. 291 292 temporary: bool, default False 293 If `True`, prevent instance tables (pipes, users, plugins) from being created. 294 295 cache: Optional[bool], default None 296 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 297 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 298 Defaults to `True` (from `None`). 299 300 cache_connector_keys: Optional[str], default None 301 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 302 """ 303 from meerschaum.utils.warnings import error, warn 304 if (not connector and not connector_keys) or (not metric and not metric_key): 305 error( 306 "Please provide strings for the connector and metric\n " 307 + "(first two positional arguments)." 308 ) 309 310 ### Fall back to legacy `location_key` just in case. 311 if not location: 312 location = location_key 313 314 if not connector: 315 connector = connector_keys 316 317 if not metric: 318 metric = metric_key 319 320 if location in ('[None]', 'None'): 321 location = None 322 323 from meerschaum._internal.static import STATIC_CONFIG 324 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 325 for k in (connector, metric, location, *(tags or [])): 326 if str(k).startswith(negation_prefix): 327 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 328 329 self.connector_keys = str(connector) 330 self.connector_key = self.connector_keys ### Alias 331 self.metric_key = metric 332 self.location_key = location 333 self.temporary = temporary 334 self.cache = cache if cache is not None else (not temporary) 335 self.cache_connector_keys = ( 336 str(cache_connector_keys) 337 if cache_connector_keys is not None 338 else None 339 ) 340 self.debug = debug 341 342 self._attributes: Dict[str, Any] = { 343 'connector_keys': self.connector_keys, 344 'metric_key': self.metric_key, 345 'location_key': self.location_key, 346 'parameters': {}, 347 } 348 349 ### only set parameters if values are provided 350 if isinstance(parameters, dict): 351 self._attributes['parameters'] = parameters 352 else: 353 if parameters is not None: 354 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 355 self._attributes['parameters'] = {} 356 357 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 358 if isinstance(columns, (list, tuple)): 359 columns = {str(col): str(col) for col in columns} 360 if isinstance(columns, dict): 361 self._attributes['parameters']['columns'] = columns 362 elif isinstance(columns, str) and 'Pipe(' in columns: 363 pass 364 elif columns is not None: 365 warn(f"The provided columns are of invalid type '{type(columns)}'.") 366 367 indices = ( 368 indices 369 or indexes 370 or self._attributes.get('parameters', {}).get('indices', None) 371 or self._attributes.get('parameters', {}).get('indexes', None) 372 ) 373 if isinstance(indices, dict): 374 indices_key = ( 375 'indexes' 376 if 'indexes' in self._attributes['parameters'] 377 else 'indices' 378 ) 379 self._attributes['parameters'][indices_key] = indices 380 381 if isinstance(tags, (list, tuple)): 382 self._attributes['parameters']['tags'] = tags 383 elif tags is not None: 384 warn(f"The provided tags are of invalid type '{type(tags)}'.") 385 386 if isinstance(target, str): 387 self._attributes['parameters']['target'] = target 388 elif target is not None: 389 warn(f"The provided target is of invalid type '{type(target)}'.") 390 391 if isinstance(dtypes, dict): 392 self._attributes['parameters']['dtypes'] = dtypes 393 elif dtypes is not None: 394 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 395 396 if isinstance(upsert, bool): 397 self._attributes['parameters']['upsert'] = upsert 398 399 if isinstance(autoincrement, bool): 400 self._attributes['parameters']['autoincrement'] = autoincrement 401 402 if isinstance(autotime, bool): 403 self._attributes['parameters']['autotime'] = autotime 404 405 if isinstance(precision, dict): 406 self._attributes['parameters']['precision'] = precision 407 elif isinstance(precision, str): 408 self._attributes['parameters']['precision'] = {'unit': precision} 409 410 if isinstance(static, bool): 411 self._attributes['parameters']['static'] = static 412 self._static = static 413 414 if isinstance(enforce, bool): 415 self._attributes['parameters']['enforce'] = enforce 416 417 if isinstance(null_indices, bool): 418 self._attributes['parameters']['null_indices'] = null_indices 419 420 if isinstance(mixed_numerics, bool): 421 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 422 423 ### NOTE: The parameters dictionary is {} by default. 424 ### A Pipe may be registered without parameters, then edited, 425 ### or a Pipe may be registered with parameters set in-memory first. 426 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 427 if _mrsm_instance is None: 428 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 429 430 if not isinstance(_mrsm_instance, str): 431 self._instance_connector = _mrsm_instance 432 self.instance_keys = str(_mrsm_instance) 433 else: 434 self.instance_keys = _mrsm_instance 435 436 if self.instance_keys == 'sql:memory': 437 self.cache = False
Parameters
- connector (str):
Keys for the pipe's source connector, e.g.
'sql:main'
. - metric (str):
Label for the pipe's contents, e.g.
'weather'
. - location (str, default None):
Label for the pipe's location. Defaults to
None
. - parameters (Optional[Dict[str, Any]], default None):
Optionally set a pipe's parameters from the constructor,
e.g. columns and other attributes.
You can edit these parameters with
edit pipes
. - columns (Union[Dict[str, str], List[str], None], default None):
Set the
columns
dictionary ofparameters
. Ifparameters
is also provided, this dictionary is added under the'columns'
key. - indices (Optional[Dict[str, Union[str, List[str]]]], default None):
Set the
indices
dictionary ofparameters
. Ifparameters
is also provided, this dictionary is added under the'indices'
key. - tags (Optional[List[str]], default None):
A list of strings to be added under the
'tags'
key ofparameters
. You can select pipes with certain tags using--tags
. - dtypes (Optional[Dict[str, str]], default None):
Set the
dtypes
dictionary ofparameters
. Ifparameters
is also provided, this dictionary is added under the'dtypes'
key. - mrsm_instance (Optional[Union[str, InstanceConnector]], default None):
Connector for the Meerschaum instance where the pipe resides.
Defaults to the preconfigured default instance (
'sql:main'
). - instance (Optional[Union[str, InstanceConnector]], default None):
Alias for
mrsm_instance
. Ifmrsm_instance
is supplied, this value is ignored. - upsert (Optional[bool], default None):
If
True
, setupsert
toTrue
in the parameters. - autoincrement (Optional[bool], default None):
If
True
, setautoincrement
in the parameters. - autotime (Optional[bool], default None):
If
True
, setautotime
in the parameters. - precision (Union[str, Dict[str, Union[str, int]], None], default None):
If provided, set
precision
in the parameters. This may be either a string (the precision unit) or a dictionary of in the form{'unit': <unit>, 'interval': <interval>}
. Default is determined by thedatetime
column dtype (e.g.datetime64[us]
ismicrosecond
precision). - static (Optional[bool], default None):
If
True
, setstatic
in the parameters. - enforce (Optional[bool], default None):
If
False
, skip data type enforcement. Default behavior isTrue
. - null_indices (Optional[bool], default None):
Set to
False
if there will be no null values in the index columns. Defaults toTrue
. - mixed_numerics (bool, default None):
If
True
, integer columns will be converted tonumeric
when floats are synced. Set toFalse
to disable this behavior. Defaults toTrue
. - temporary (bool, default False):
If
True
, prevent instance tables (pipes, users, plugins) from being created. - cache (Optional[bool], default None):
If
True
, cache the pipe's metadata to disk (in addition to in-memory caching). Ifcache
is not explicitlyTrue
, it is set toFalse
iftemporary
isTrue
. Defaults toTrue
(fromNone
). - cache_connector_keys (Optional[str], default None):
If provided, use the keys to a Valkey connector (e.g.
valkey:main
).
440 @property 441 def meta(self): 442 """ 443 Return the four keys needed to reconstruct this pipe. 444 """ 445 return { 446 'connector_keys': self.connector_keys, 447 'metric_key': self.metric_key, 448 'location_key': self.location_key, 449 'instance_keys': self.instance_keys, 450 }
Return the four keys needed to reconstruct this pipe.
452 def keys(self) -> List[str]: 453 """ 454 Return the ordered keys for this pipe. 455 """ 456 return { 457 key: val 458 for key, val in self.meta.items() 459 if key != 'instance' 460 }
Return the ordered keys for this pipe.
462 @property 463 def instance_connector(self) -> Union[InstanceConnector, None]: 464 """ 465 The instance connector on which this pipe resides. 466 """ 467 if '_instance_connector' not in self.__dict__: 468 from meerschaum.connectors.parse import parse_instance_keys 469 conn = parse_instance_keys(self.instance_keys) 470 if conn: 471 self._instance_connector = conn 472 else: 473 return None 474 return self._instance_connector
The instance connector on which this pipe resides.
476 @property 477 def connector(self) -> Union['Connector', None]: 478 """ 479 The connector to the data source. 480 """ 481 if '_connector' not in self.__dict__: 482 from meerschaum.connectors.parse import parse_instance_keys 483 import warnings 484 with warnings.catch_warnings(): 485 warnings.simplefilter('ignore') 486 try: 487 conn = parse_instance_keys(self.connector_keys) 488 except Exception: 489 conn = None 490 if conn: 491 self._connector = conn 492 else: 493 return None 494 return self._connector
The connector to the data source.
21def fetch( 22 self, 23 begin: Union[datetime, int, str, None] = '', 24 end: Union[datetime, int, None] = None, 25 check_existing: bool = True, 26 sync_chunks: bool = False, 27 debug: bool = False, 28 **kw: Any 29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 30 """ 31 Fetch a Pipe's latest data from its connector. 32 33 Parameters 34 ---------- 35 begin: Union[datetime, str, None], default '': 36 If provided, only fetch data newer than or equal to `begin`. 37 38 end: Optional[datetime], default None: 39 If provided, only fetch data older than or equal to `end`. 40 41 check_existing: bool, default True 42 If `False`, do not apply the backtrack interval. 43 44 sync_chunks: bool, default False 45 If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching 46 loads chunks into memory. 47 48 debug: bool, default False 49 Verbosity toggle. 50 51 Returns 52 ------- 53 A `pd.DataFrame` of the newest unseen data. 54 55 """ 56 if 'fetch' not in dir(self.connector): 57 warn(f"No `fetch()` function defined for connector '{self.connector}'") 58 return None 59 60 from meerschaum.connectors import get_connector_plugin 61 from meerschaum.utils.misc import filter_arguments 62 63 _chunk_hook = kw.pop('chunk_hook', None) 64 kw['workers'] = self.get_num_workers(kw.get('workers', None)) 65 if sync_chunks and _chunk_hook is None: 66 67 def _chunk_hook(chunk, **_kw) -> SuccessTuple: 68 """ 69 Wrap `Pipe.sync()` with a custom chunk label prepended to the message. 70 """ 71 from meerschaum.config._patch import apply_patch_to_config 72 kwargs = apply_patch_to_config(kw, _kw) 73 chunk_success, chunk_message = self.sync(chunk, **kwargs) 74 chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None)) 75 if chunk_label: 76 chunk_message = '\n' + chunk_label + '\n' + chunk_message 77 return chunk_success, chunk_message 78 79 begin, end = self.parse_date_bounds(begin, end) 80 81 with mrsm.Venv(get_connector_plugin(self.connector)): 82 _args, _kwargs = filter_arguments( 83 self.connector.fetch, 84 self, 85 begin=_determine_begin( 86 self, 87 begin, 88 end, 89 check_existing=check_existing, 90 debug=debug, 91 ), 92 end=end, 93 chunk_hook=_chunk_hook, 94 debug=debug, 95 **kw 96 ) 97 df = self.connector.fetch(*_args, **_kwargs) 98 return df
Fetch a Pipe's latest data from its connector.
Parameters
- begin (Union[datetime, str, None], default '':):
If provided, only fetch data newer than or equal to
begin
. - end (Optional[datetime], default None:):
If provided, only fetch data older than or equal to
end
. - check_existing (bool, default True):
If
False
, do not apply the backtrack interval. - sync_chunks (bool, default False):
If
True
and the pipe's connector is of type'sql'
, begin syncing chunks while fetching loads chunks into memory. - debug (bool, default False): Verbosity toggle.
Returns
- A
pd.DataFrame
of the newest unseen data.
101def get_backtrack_interval( 102 self, 103 check_existing: bool = True, 104 debug: bool = False, 105) -> Union[timedelta, int]: 106 """ 107 Get the chunk interval to use for this pipe. 108 109 Parameters 110 ---------- 111 check_existing: bool, default True 112 If `False`, return a backtrack_interval of 0 minutes. 113 114 Returns 115 ------- 116 The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 117 """ 118 default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes') 119 configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None) 120 backtrack_minutes = ( 121 configured_backtrack_minutes 122 if configured_backtrack_minutes is not None 123 else default_backtrack_minutes 124 ) if check_existing else 0 125 126 backtrack_interval = timedelta(minutes=backtrack_minutes) 127 dt_col = self.columns.get('datetime', None) 128 if dt_col is None: 129 return backtrack_interval 130 131 dt_dtype = self.dtypes.get(dt_col, 'datetime') 132 if 'int' in dt_dtype.lower(): 133 return backtrack_minutes 134 135 return backtrack_interval
Get the chunk interval to use for this pipe.
Parameters
- check_existing (bool, default True):
If
False
, return a backtrack_interval of 0 minutes.
Returns
- The backtrack interval (
timedelta
orint
) to use with this pipe'sdatetime
axis.
23def get_data( 24 self, 25 select_columns: Optional[List[str]] = None, 26 omit_columns: Optional[List[str]] = None, 27 begin: Union[datetime, int, str, None] = None, 28 end: Union[datetime, int, str, None] = None, 29 params: Optional[Dict[str, Any]] = None, 30 as_iterator: bool = False, 31 as_chunks: bool = False, 32 as_dask: bool = False, 33 add_missing_columns: bool = False, 34 chunk_interval: Union[timedelta, int, None] = None, 35 order: Optional[str] = 'asc', 36 limit: Optional[int] = None, 37 fresh: bool = False, 38 debug: bool = False, 39 **kw: Any 40) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 41 """ 42 Get a pipe's data from the instance connector. 43 44 Parameters 45 ---------- 46 select_columns: Optional[List[str]], default None 47 If provided, only select these given columns. 48 Otherwise select all available columns (i.e. `SELECT *`). 49 50 omit_columns: Optional[List[str]], default None 51 If provided, remove these columns from the selection. 52 53 begin: Union[datetime, int, str, None], default None 54 Lower bound datetime to begin searching for data (inclusive). 55 Translates to a `WHERE` clause like `WHERE datetime >= begin`. 56 Defaults to `None`. 57 58 end: Union[datetime, int, str, None], default None 59 Upper bound datetime to stop searching for data (inclusive). 60 Translates to a `WHERE` clause like `WHERE datetime < end`. 61 Defaults to `None`. 62 63 params: Optional[Dict[str, Any]], default None 64 Filter the retrieved data by a dictionary of parameters. 65 See `meerschaum.utils.sql.build_where` for more details. 66 67 as_iterator: bool, default False 68 If `True`, return a generator of chunks of pipe data. 69 70 as_chunks: bool, default False 71 Alias for `as_iterator`. 72 73 as_dask: bool, default False 74 If `True`, return a `dask.DataFrame` 75 (which may be loaded into a Pandas DataFrame with `df.compute()`). 76 77 add_missing_columns: bool, default False 78 If `True`, add any missing columns from `Pipe.dtypes` to the dataframe. 79 80 chunk_interval: Union[timedelta, int, None], default None 81 If `as_iterator`, then return chunks with `begin` and `end` separated by this interval. 82 This may be set under `pipe.parameters['chunk_minutes']`. 83 By default, use a timedelta of 1440 minutes (1 day). 84 If `chunk_interval` is an integer and the `datetime` axis a timestamp, 85 the use a timedelta with the number of minutes configured to this value. 86 If the `datetime` axis is an integer, default to the configured chunksize. 87 If `chunk_interval` is a `timedelta` and the `datetime` axis an integer, 88 use the number of minutes in the `timedelta`. 89 90 order: Optional[str], default 'asc' 91 If `order` is not `None`, sort the resulting dataframe by indices. 92 93 limit: Optional[int], default None 94 If provided, cap the dataframe to this many rows. 95 96 fresh: bool, default False 97 If `True`, skip local cache and directly query the instance connector. 98 99 debug: bool, default False 100 Verbosity toggle. 101 Defaults to `False`. 102 103 Returns 104 ------- 105 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. 106 107 """ 108 from meerschaum.utils.warnings import warn 109 from meerschaum.utils.venv import Venv 110 from meerschaum.connectors import get_connector_plugin 111 from meerschaum.utils.dtypes import to_pandas_dtype 112 from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator 113 from meerschaum.utils.packages import attempt_import 114 from meerschaum.utils.warnings import dprint 115 dd = attempt_import('dask.dataframe') if as_dask else None 116 dask = attempt_import('dask') if as_dask else None 117 _ = attempt_import('partd', lazy=False) if as_dask else None 118 119 if select_columns == '*': 120 select_columns = None 121 elif isinstance(select_columns, str): 122 select_columns = [select_columns] 123 124 if isinstance(omit_columns, str): 125 omit_columns = [omit_columns] 126 127 begin, end = self.parse_date_bounds(begin, end) 128 as_iterator = as_iterator or as_chunks 129 dt_col = self.columns.get('datetime', None) 130 131 def _sort_df(_df): 132 if df_is_chunk_generator(_df): 133 return _df 134 indices = [] if dt_col not in _df.columns else [dt_col] 135 non_dt_cols = [ 136 col 137 for col_ix, col in self.columns.items() 138 if col_ix != 'datetime' and col in _df.columns 139 ] 140 indices.extend(non_dt_cols) 141 if 'dask' not in _df.__module__: 142 _df.sort_values( 143 by=indices, 144 inplace=True, 145 ascending=(str(order).lower() == 'asc'), 146 ) 147 _df.reset_index(drop=True, inplace=True) 148 else: 149 _df = _df.sort_values( 150 by=indices, 151 ascending=(str(order).lower() == 'asc'), 152 ) 153 _df = _df.reset_index(drop=True) 154 if limit is not None and len(_df) > limit: 155 return _df.head(limit) 156 return _df 157 158 if as_iterator or as_chunks: 159 df = self._get_data_as_iterator( 160 select_columns=select_columns, 161 omit_columns=omit_columns, 162 begin=begin, 163 end=end, 164 params=params, 165 chunk_interval=chunk_interval, 166 limit=limit, 167 order=order, 168 fresh=fresh, 169 debug=debug, 170 ) 171 return _sort_df(df) 172 173 if as_dask: 174 from multiprocessing.pool import ThreadPool 175 dask_pool = ThreadPool(self.get_num_workers()) 176 dask.config.set(pool=dask_pool) 177 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 178 bounds = self.get_chunk_bounds( 179 begin=begin, 180 end=end, 181 bounded=False, 182 chunk_interval=chunk_interval, 183 debug=debug, 184 ) 185 dask_chunks = [ 186 dask.delayed(self.get_data)( 187 select_columns=select_columns, 188 omit_columns=omit_columns, 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 chunk_interval=chunk_interval, 193 order=order, 194 limit=limit, 195 fresh=fresh, 196 add_missing_columns=True, 197 debug=debug, 198 ) 199 for (chunk_begin, chunk_end) in bounds 200 ] 201 dask_meta = { 202 col: to_pandas_dtype(typ) 203 for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items() 204 } 205 if debug: 206 dprint(f"Dask meta:\n{dask_meta}") 207 return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta)) 208 209 if not self.exists(debug=debug): 210 return None 211 212 with Venv(get_connector_plugin(self.instance_connector)): 213 df = self.instance_connector.get_pipe_data( 214 pipe=self, 215 select_columns=select_columns, 216 omit_columns=omit_columns, 217 begin=begin, 218 end=end, 219 params=params, 220 limit=limit, 221 order=order, 222 debug=debug, 223 **kw 224 ) 225 if df is None: 226 return df 227 228 if not select_columns: 229 select_columns = [col for col in df.columns] 230 231 pipe_dtypes = self.get_dtypes(refresh=False, debug=debug) 232 cols_to_omit = [ 233 col 234 for col in df.columns 235 if ( 236 col in (omit_columns or []) 237 or 238 col not in (select_columns or []) 239 ) 240 ] 241 cols_to_add = [ 242 col 243 for col in select_columns 244 if col not in df.columns 245 ] + ([ 246 col 247 for col in pipe_dtypes 248 if col not in df.columns 249 ] if add_missing_columns else []) 250 if cols_to_omit: 251 warn( 252 ( 253 f"Received {len(cols_to_omit)} omitted column" 254 + ('s' if len(cols_to_omit) != 1 else '') 255 + f" for {self}. " 256 + "Consider adding `select_columns` and `omit_columns` support to " 257 + f"'{self.instance_connector.type}' connectors to improve performance." 258 ), 259 stack=False, 260 ) 261 _cols_to_select = [col for col in df.columns if col not in cols_to_omit] 262 df = df[_cols_to_select] 263 264 if cols_to_add: 265 if not add_missing_columns: 266 from meerschaum.utils.misc import items_str 267 warn( 268 f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.", 269 stack=False, 270 ) 271 272 df = add_missing_cols_to_df( 273 df, 274 { 275 col: pipe_dtypes.get(col, 'string') 276 for col in cols_to_add 277 }, 278 ) 279 280 enforced_df = self.enforce_dtypes( 281 df, 282 dtypes=pipe_dtypes, 283 debug=debug, 284 ) 285 286 if order: 287 return _sort_df(enforced_df) 288 return enforced_df
Get a pipe's data from the instance connector.
Parameters
- select_columns (Optional[List[str]], default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *
). - omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
- begin (Union[datetime, int, str, None], default None):
Lower bound datetime to begin searching for data (inclusive).
Translates to a
WHERE
clause likeWHERE datetime >= begin
. Defaults toNone
. - end (Union[datetime, int, str, None], default None):
Upper bound datetime to stop searching for data (inclusive).
Translates to a
WHERE
clause likeWHERE datetime < end
. Defaults toNone
. - params (Optional[Dict[str, Any]], default None):
Filter the retrieved data by a dictionary of parameters.
See
meerschaum.utils.sql.build_where
for more details. - as_iterator (bool, default False):
If
True
, return a generator of chunks of pipe data. - as_chunks (bool, default False):
Alias for
as_iterator
. - as_dask (bool, default False):
If
True
, return adask.DataFrame
(which may be loaded into a Pandas DataFrame withdf.compute()
). - add_missing_columns (bool, default False):
If
True
, add any missing columns fromPipe.dtypes
to the dataframe. - chunk_interval (Union[timedelta, int, None], default None):
If
as_iterator
, then return chunks withbegin
andend
separated by this interval. This may be set underpipe.parameters['chunk_minutes']
. By default, use a timedelta of 1440 minutes (1 day). Ifchunk_interval
is an integer and thedatetime
axis a timestamp, the use a timedelta with the number of minutes configured to this value. If thedatetime
axis is an integer, default to the configured chunksize. Ifchunk_interval
is atimedelta
and thedatetime
axis an integer, use the number of minutes in thetimedelta
. - order (Optional[str], default 'asc'):
If
order
is notNone
, sort the resulting dataframe by indices. - limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
- fresh (bool, default False):
If
True
, skip local cache and directly query the instance connector. - debug (bool, default False):
Verbosity toggle.
Defaults to
False
.
Returns
- A
pd.DataFrame
for the pipe's data corresponding to the provided parameters.
380def get_backtrack_data( 381 self, 382 backtrack_minutes: Optional[int] = None, 383 begin: Union[datetime, int, None] = None, 384 params: Optional[Dict[str, Any]] = None, 385 limit: Optional[int] = None, 386 fresh: bool = False, 387 debug: bool = False, 388 **kw: Any 389) -> Optional['pd.DataFrame']: 390 """ 391 Get the most recent data from the instance connector as a Pandas DataFrame. 392 393 Parameters 394 ---------- 395 backtrack_minutes: Optional[int], default None 396 How many minutes from `begin` to select from. 397 If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`. 398 399 begin: Optional[datetime], default None 400 The starting point to search for data. 401 If begin is `None` (default), use the most recent observed datetime 402 (AKA sync_time). 403 404 ``` 405 E.g. begin = 02:00 406 407 Search this region. Ignore this, even if there's data. 408 / / / / / / / / / | 409 -----|----------|----------|----------|----------|----------| 410 00:00 01:00 02:00 03:00 04:00 05:00 411 412 ``` 413 414 params: Optional[Dict[str, Any]], default None 415 The standard Meerschaum `params` query dictionary. 416 417 limit: Optional[int], default None 418 If provided, cap the number of rows to be returned. 419 420 fresh: bool, default False 421 If `True`, Ignore local cache and pull directly from the instance connector. 422 Only comes into effect if a pipe was created with `cache=True`. 423 424 debug: bool default False 425 Verbosity toggle. 426 427 Returns 428 ------- 429 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data 430 is a convenient way to get a pipe's data "backtracked" from the most recent datetime. 431 """ 432 from meerschaum.utils.warnings import warn 433 from meerschaum.utils.venv import Venv 434 from meerschaum.connectors import get_connector_plugin 435 436 if not self.exists(debug=debug): 437 return None 438 439 begin = self.parse_date_bounds(begin) 440 441 backtrack_interval = self.get_backtrack_interval(debug=debug) 442 if backtrack_minutes is None: 443 backtrack_minutes = ( 444 (backtrack_interval.total_seconds() / 60) 445 if isinstance(backtrack_interval, timedelta) 446 else backtrack_interval 447 ) 448 449 if hasattr(self.instance_connector, 'get_backtrack_data'): 450 with Venv(get_connector_plugin(self.instance_connector)): 451 return self.enforce_dtypes( 452 self.instance_connector.get_backtrack_data( 453 pipe=self, 454 begin=begin, 455 backtrack_minutes=backtrack_minutes, 456 params=params, 457 limit=limit, 458 debug=debug, 459 **kw 460 ), 461 debug=debug, 462 ) 463 464 if begin is None: 465 begin = self.get_sync_time(params=params, debug=debug) 466 467 backtrack_interval = ( 468 timedelta(minutes=backtrack_minutes) 469 if isinstance(begin, datetime) 470 else backtrack_minutes 471 ) 472 if begin is not None: 473 begin = begin - backtrack_interval 474 475 return self.get_data( 476 begin=begin, 477 params=params, 478 debug=debug, 479 limit=limit, 480 order=kw.get('order', 'desc'), 481 **kw 482 )
Get the most recent data from the instance connector as a Pandas DataFrame.
Parameters
- backtrack_minutes (Optional[int], default None):
How many minutes from
begin
to select from. IfNone
, usepipe.parameters['fetch']['backtrack_minutes']
. begin (Optional[datetime], default None): The starting point to search for data. If begin is
None
(default), use the most recent observed datetime (AKA sync_time).E.g. begin = 02:00 Search this region. Ignore this, even if there's data. / / / / / / / / / | -----|----------|----------|----------|----------|----------| 00:00 01:00 02:00 03:00 04:00 05:00
params (Optional[Dict[str, Any]], default None): The standard Meerschaum
params
query dictionary.- limit (Optional[int], default None): If provided, cap the number of rows to be returned.
- fresh (bool, default False):
If
True
, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created withcache=True
. - debug (bool default False): Verbosity toggle.
Returns
- A
pd.DataFrame
for the pipe's data corresponding to the provided parameters. Backtrack data - is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
485def get_rowcount( 486 self, 487 begin: Union[datetime, int, None] = None, 488 end: Union[datetime, int, None] = None, 489 params: Optional[Dict[str, Any]] = None, 490 remote: bool = False, 491 debug: bool = False 492) -> int: 493 """ 494 Get a Pipe's instance or remote rowcount. 495 496 Parameters 497 ---------- 498 begin: Optional[datetime], default None 499 Count rows where datetime > begin. 500 501 end: Optional[datetime], default None 502 Count rows where datetime < end. 503 504 remote: bool, default False 505 Count rows from a pipe's remote source. 506 **NOTE**: This is experimental! 507 508 debug: bool, default False 509 Verbosity toggle. 510 511 Returns 512 ------- 513 An `int` of the number of rows in the pipe corresponding to the provided parameters. 514 Returned 0 if the pipe does not exist. 515 """ 516 from meerschaum.utils.warnings import warn 517 from meerschaum.utils.venv import Venv 518 from meerschaum.connectors import get_connector_plugin 519 from meerschaum.utils.misc import filter_keywords 520 521 begin, end = self.parse_date_bounds(begin, end) 522 connector = self.instance_connector if not remote else self.connector 523 try: 524 with Venv(get_connector_plugin(connector)): 525 if not hasattr(connector, 'get_pipe_rowcount'): 526 warn( 527 f"Connectors of type '{connector.type}' " 528 "do not implement `get_pipe_rowcount()`.", 529 stack=False, 530 ) 531 return 0 532 kwargs = filter_keywords( 533 connector.get_pipe_rowcount, 534 begin=begin, 535 end=end, 536 params=params, 537 remote=remote, 538 debug=debug, 539 ) 540 if remote and 'remote' not in kwargs: 541 warn( 542 f"Connectors of type '{connector.type}' do not support remote rowcounts.", 543 stack=False, 544 ) 545 return 0 546 rowcount = connector.get_pipe_rowcount( 547 self, 548 begin=begin, 549 end=end, 550 params=params, 551 remote=remote, 552 debug=debug, 553 ) 554 if rowcount is None: 555 return 0 556 return rowcount 557 except AttributeError as e: 558 warn(e) 559 if remote: 560 return 0 561 warn(f"Failed to get a rowcount for {self}.") 562 return 0
Get a Pipe's instance or remote rowcount.
Parameters
- begin (Optional[datetime], default None): Count rows where datetime > begin.
- end (Optional[datetime], default None): Count rows where datetime < end.
- remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
- debug (bool, default False): Verbosity toggle.
Returns
- An
int
of the number of rows in the pipe corresponding to the provided parameters. - Returned 0 if the pipe does not exist.
826def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]: 827 """ 828 Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data(). 829 Keywords arguments are passed to `Pipe.get_data()`. 830 """ 831 from meerschaum.utils.warnings import warn 832 kwargs['limit'] = 1 833 try: 834 result_df = self.get_data(**kwargs) 835 if result_df is None or len(result_df) == 0: 836 return None 837 return result_df.reset_index(drop=True).iloc[0].to_dict() 838 except Exception as e: 839 warn(f"Failed to read value from {self}:\n{e}", stack=False) 840 return None
Convenience function to return a single row as a dictionary (or None
) from Pipe.get_data().
Keywords arguments are passed to
Pipe.get_data()`.
842def get_value( 843 self, 844 column: str, 845 params: Optional[Dict[str, Any]] = None, 846 **kwargs: Any 847) -> Any: 848 """ 849 Convenience function to return a single value (or `None`) from `Pipe.get_data()`. 850 Keywords arguments are passed to `Pipe.get_data()`. 851 """ 852 from meerschaum.utils.warnings import warn 853 kwargs['select_columns'] = [column] 854 kwargs['limit'] = 1 855 try: 856 result_df = self.get_data(params=params, **kwargs) 857 if result_df is None or len(result_df) == 0: 858 return None 859 if column not in result_df.columns: 860 raise ValueError(f"Column '{column}' was not included in the result set.") 861 return result_df[column][0] 862 except Exception as e: 863 warn(f"Failed to read value from {self}:\n{e}", stack=False) 864 return None
Convenience function to return a single value (or None
) from Pipe.get_data()
.
Keywords arguments are passed to Pipe.get_data()
.
565def get_chunk_interval( 566 self, 567 chunk_interval: Union[timedelta, int, None] = None, 568 debug: bool = False, 569) -> Union[timedelta, int]: 570 """ 571 Get the chunk interval to use for this pipe. 572 573 Parameters 574 ---------- 575 chunk_interval: Union[timedelta, int, None], default None 576 If provided, coerce this value into the correct type. 577 For example, if the datetime axis is an integer, then 578 return the number of minutes. 579 580 Returns 581 ------- 582 The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 583 """ 584 default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes') 585 configured_chunk_minutes = self.parameters.get('verify', {}).get('chunk_minutes', None) 586 chunk_minutes = ( 587 (configured_chunk_minutes or default_chunk_minutes) 588 if chunk_interval is None 589 else ( 590 chunk_interval 591 if isinstance(chunk_interval, int) 592 else int(chunk_interval.total_seconds() / 60) 593 ) 594 ) 595 596 dt_col = self.columns.get('datetime', None) 597 if dt_col is None: 598 return timedelta(minutes=chunk_minutes) 599 600 dt_dtype = self.dtypes.get(dt_col, 'datetime') 601 if 'int' in dt_dtype.lower(): 602 return chunk_minutes 603 return timedelta(minutes=chunk_minutes)
Get the chunk interval to use for this pipe.
Parameters
- chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type. For example, if the datetime axis is an integer, then return the number of minutes.
Returns
- The chunk interval (
timedelta
orint
) to use with this pipe'sdatetime
axis.
606def get_chunk_bounds( 607 self, 608 begin: Union[datetime, int, None] = None, 609 end: Union[datetime, int, None] = None, 610 bounded: bool = False, 611 chunk_interval: Union[timedelta, int, None] = None, 612 debug: bool = False, 613) -> List[ 614 Tuple[ 615 Union[datetime, int, None], 616 Union[datetime, int, None], 617 ] 618]: 619 """ 620 Return a list of datetime bounds for iterating over the pipe's `datetime` axis. 621 622 Parameters 623 ---------- 624 begin: Union[datetime, int, None], default None 625 If provided, do not select less than this value. 626 Otherwise the first chunk will be unbounded. 627 628 end: Union[datetime, int, None], default None 629 If provided, do not select greater than or equal to this value. 630 Otherwise the last chunk will be unbounded. 631 632 bounded: bool, default False 633 If `True`, do not include `None` in the first chunk. 634 635 chunk_interval: Union[timedelta, int, None], default None 636 If provided, use this interval for the size of chunk boundaries. 637 The default value for this pipe may be set 638 under `pipe.parameters['verify']['chunk_minutes']`. 639 640 debug: bool, default False 641 Verbosity toggle. 642 643 Returns 644 ------- 645 A list of chunk bounds (datetimes or integers). 646 If unbounded, the first and last chunks will include `None`. 647 """ 648 from datetime import timedelta 649 from meerschaum.utils.dtypes import are_dtypes_equal 650 from meerschaum.utils.misc import interval_str 651 include_less_than_begin = not bounded and begin is None 652 include_greater_than_end = not bounded and end is None 653 if begin is None: 654 begin = self.get_sync_time(newest=False, debug=debug) 655 consolidate_end_chunk = False 656 if end is None: 657 end = self.get_sync_time(newest=True, debug=debug) 658 if end is not None and hasattr(end, 'tzinfo'): 659 end += timedelta(minutes=1) 660 consolidate_end_chunk = True 661 elif are_dtypes_equal(str(type(end)), 'int'): 662 end += 1 663 consolidate_end_chunk = True 664 665 if begin is None and end is None: 666 return [(None, None)] 667 668 begin, end = self.parse_date_bounds(begin, end) 669 670 if begin and end: 671 if begin >= end: 672 return ( 673 [(begin, begin)] 674 if bounded 675 else [(begin, None)] 676 ) 677 if end <= begin: 678 return ( 679 [(end, end)] 680 if bounded 681 else [(None, begin)] 682 ) 683 684 ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`. 685 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 686 687 ### Build a list of tuples containing the chunk boundaries 688 ### so that we can sync multiple chunks in parallel. 689 ### Run `verify pipes --workers 1` to sync chunks in series. 690 chunk_bounds = [] 691 begin_cursor = begin 692 num_chunks = 0 693 max_chunks = 1_000_000 694 while begin_cursor < end: 695 end_cursor = begin_cursor + chunk_interval 696 chunk_bounds.append((begin_cursor, end_cursor)) 697 begin_cursor = end_cursor 698 num_chunks += 1 699 if num_chunks >= max_chunks: 700 raise ValueError( 701 f"Too many chunks of size '{interval_str(chunk_interval)}' " 702 f"between '{begin}' and '{end}'." 703 ) 704 705 if num_chunks > 1 and consolidate_end_chunk: 706 last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2] 707 chunk_bounds = chunk_bounds[:-2] 708 chunk_bounds.append((second_last_bounds[0], last_bounds[1])) 709 710 ### The chunk interval might be too large. 711 if not chunk_bounds and end >= begin: 712 chunk_bounds = [(begin, end)] 713 714 ### Truncate the last chunk to the end timestamp. 715 if chunk_bounds[-1][1] > end: 716 chunk_bounds[-1] = (chunk_bounds[-1][0], end) 717 718 ### Pop the last chunk if its bounds are equal. 719 if chunk_bounds[-1][0] == chunk_bounds[-1][1]: 720 chunk_bounds = chunk_bounds[:-1] 721 722 if include_less_than_begin: 723 chunk_bounds = [(None, begin)] + chunk_bounds 724 if include_greater_than_end: 725 chunk_bounds = chunk_bounds + [(end, None)] 726 727 return chunk_bounds
Return a list of datetime bounds for iterating over the pipe's datetime
axis.
Parameters
- begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
- end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
- bounded (bool, default False):
If
True
, do not includeNone
in the first chunk. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this interval for the size of chunk boundaries.
The default value for this pipe may be set
under
pipe.parameters['verify']['chunk_minutes']
. - debug (bool, default False): Verbosity toggle.
Returns
- A list of chunk bounds (datetimes or integers).
- If unbounded, the first and last chunks will include
None
.
730def get_chunk_bounds_batches( 731 self, 732 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]], 733 batchsize: Optional[int] = None, 734 workers: Optional[int] = None, 735 debug: bool = False, 736) -> List[ 737 Tuple[ 738 Tuple[ 739 Union[datetime, int, None], 740 Union[datetime, int, None], 741 ], ... 742 ] 743]: 744 """ 745 Return a list of tuples of chunk bounds of size `batchsize`. 746 747 Parameters 748 ---------- 749 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]] 750 A list of chunk_bounds (see `Pipe.get_chunk_bounds()`). 751 752 batchsize: Optional[int], default None 753 How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`. 754 755 workers: Optional[int], default None 756 If `batchsize` is `None`, use this as the desired number of workers. 757 Passed to `Pipe.get_num_workers()`. 758 759 Returns 760 ------- 761 A list of tuples of chunk bound tuples. 762 """ 763 from meerschaum.utils.misc import iterate_chunks 764 765 if batchsize is None: 766 batchsize = self.get_num_workers(workers=workers) 767 768 return [ 769 tuple( 770 _batch_chunk_bounds 771 for _batch_chunk_bounds in batch 772 if _batch_chunk_bounds is not None 773 ) 774 for batch in iterate_chunks(chunk_bounds, batchsize) 775 if batch 776 ]
Return a list of tuples of chunk bounds of size batchsize
.
Parameters
- chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]):
A list of chunk_bounds (see
Pipe.get_chunk_bounds()
). - batchsize (Optional[int], default None):
How many chunks to include in a batch. Defaults to
Pipe.get_num_workers()
. - workers (Optional[int], default None):
If
batchsize
isNone
, use this as the desired number of workers. Passed toPipe.get_num_workers()
.
Returns
- A list of tuples of chunk bound tuples.
779def parse_date_bounds(self, *dt_vals: Union[datetime, int, None]) -> Union[ 780 datetime, 781 int, 782 str, 783 None, 784 Tuple[Union[datetime, int, str, None]] 785]: 786 """ 787 Given a date bound (begin, end), coerce a timezone if necessary. 788 """ 789 from meerschaum.utils.misc import is_int 790 from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES 791 from meerschaum.utils.warnings import warn 792 dateutil_parser = mrsm.attempt_import('dateutil.parser') 793 794 def _parse_date_bound(dt_val): 795 if dt_val is None: 796 return None 797 798 if isinstance(dt_val, int): 799 return dt_val 800 801 if dt_val == '': 802 return '' 803 804 if is_int(dt_val): 805 return int(dt_val) 806 807 if isinstance(dt_val, str): 808 try: 809 dt_val = dateutil_parser.parse(dt_val) 810 except Exception as e: 811 warn(f"Could not parse '{dt_val}' as datetime:\n{e}") 812 return None 813 814 dt_col = self.columns.get('datetime', None) 815 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) 816 if dt_typ == 'datetime': 817 dt_typ = MRSM_PD_DTYPES['datetime'] 818 return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower())) 819 820 bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals) 821 if len(bounds) == 1: 822 return bounds[0] 823 return bounds
Given a date bound (begin, end), coerce a timezone if necessary.
12def register( 13 self, 14 debug: bool = False, 15 **kw: Any 16 ) -> SuccessTuple: 17 """ 18 Register a new Pipe along with its attributes. 19 20 Parameters 21 ---------- 22 debug: bool, default False 23 Verbosity toggle. 24 25 kw: Any 26 Keyword arguments to pass to `instance_connector.register_pipe()`. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 """ 32 if self.temporary: 33 return False, "Cannot register pipes created with `temporary=True` (read-only)." 34 35 from meerschaum.utils.formatting import get_console 36 from meerschaum.utils.venv import Venv 37 from meerschaum.connectors import get_connector_plugin, custom_types 38 from meerschaum.config._patch import apply_patch_to_config 39 40 import warnings 41 with warnings.catch_warnings(): 42 warnings.simplefilter('ignore') 43 try: 44 _conn = self.connector 45 except Exception as e: 46 _conn = None 47 48 if ( 49 _conn is not None 50 and 51 (_conn.type == 'plugin' or _conn.type in custom_types) 52 and 53 getattr(_conn, 'register', None) is not None 54 ): 55 try: 56 with Venv(get_connector_plugin(_conn), debug=debug): 57 params = self.connector.register(self) 58 except Exception as e: 59 get_console().print_exception() 60 params = None 61 params = {} if params is None else params 62 if not isinstance(params, dict): 63 from meerschaum.utils.warnings import warn 64 warn( 65 f"Invalid parameters returned from `register()` in connector {self.connector}:\n" 66 + f"{params}" 67 ) 68 else: 69 self.parameters = apply_patch_to_config(params, self.parameters) 70 71 if not self.parameters: 72 cols = self.columns if self.columns else {'datetime': None, 'id': None} 73 self.parameters = { 74 'columns': cols, 75 } 76 77 with Venv(get_connector_plugin(self.instance_connector)): 78 return self.instance_connector.register_pipe(self, debug=debug, **kw)
Register a new Pipe along with its attributes.
Parameters
- debug (bool, default False): Verbosity toggle.
- kw (Any):
Keyword arguments to pass to
instance_connector.register_pipe()
.
Returns
- A
SuccessTuple
of success, message.
20@property 21def attributes(self) -> Dict[str, Any]: 22 """ 23 Return a dictionary of a pipe's keys and parameters. 24 These values are reflected directly from the pipes table of the instance. 25 """ 26 from meerschaum.config import get_config 27 from meerschaum.config._patch import apply_patch_to_config 28 from meerschaum.utils.venv import Venv 29 from meerschaum.connectors import get_connector_plugin 30 from meerschaum.utils.dtypes import get_current_timestamp 31 32 timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds') 33 34 now = get_current_timestamp('ms', as_int=True) / 1000 35 _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug) 36 timed_out = ( 37 _attributes_sync_time is None 38 or 39 (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds) 40 ) 41 if not self.temporary and timed_out: 42 self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug) 43 local_attributes = self._get_cached_value('attributes', debug=self.debug) or {} 44 with Venv(get_connector_plugin(self.instance_connector)): 45 instance_attributes = self.instance_connector.get_pipe_attributes(self) 46 47 self._cache_value( 48 'attributes', 49 apply_patch_to_config(instance_attributes, local_attributes), 50 memory_only=True, 51 debug=self.debug, 52 ) 53 54 return self._attributes
Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.
134@property 135def parameters(self) -> Optional[Dict[str, Any]]: 136 """ 137 Return the parameters dictionary of the pipe. 138 """ 139 return self.get_parameters(debug=self.debug)
Return the parameters dictionary of the pipe.
151@property 152def columns(self) -> Union[Dict[str, str], None]: 153 """ 154 Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`. 155 """ 156 cols = self.parameters.get('columns', {}) 157 if not isinstance(cols, dict): 158 return {} 159 return {col_ix: col for col_ix, col in cols.items() if col and col_ix}
Return the columns
dictionary defined in meerschaum.Pipe.parameters
.
176@property 177def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]: 178 """ 179 Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`. 180 """ 181 _parameters = self.get_parameters(debug=self.debug) 182 indices_key = ( 183 'indexes' 184 if 'indexes' in _parameters 185 else 'indices' 186 ) 187 188 _indices = _parameters.get(indices_key, {}) 189 _columns = self.columns 190 dt_col = _columns.get('datetime', None) 191 if not isinstance(_indices, dict): 192 _indices = {} 193 unique_cols = list(set(( 194 [dt_col] 195 if dt_col 196 else [] 197 ) + [ 198 col 199 for col_ix, col in _columns.items() 200 if col and col_ix != 'datetime' 201 ])) 202 return { 203 **({'unique': unique_cols} if len(unique_cols) > 1 else {}), 204 **{col_ix: col for col_ix, col in _columns.items() if col}, 205 **_indices 206 }
Return the indices
dictionary defined in meerschaum.Pipe.parameters
.
209@property 210def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]: 211 """ 212 Alias for `meerschaum.Pipe.indices`. 213 """ 214 return self.indices
Alias for meerschaum.Pipe.indices
.
265@property 266def dtypes(self) -> Dict[str, Any]: 267 """ 268 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 269 """ 270 return self.get_dtypes(refresh=False, debug=self.debug)
If defined, return the dtypes
dictionary defined in meerschaum.Pipe.parameters
.
369@property 370def autoincrement(self) -> bool: 371 """ 372 Return the `autoincrement` parameter for the pipe. 373 """ 374 return self.parameters.get('autoincrement', False)
Return the autoincrement
parameter for the pipe.
385@property 386def autotime(self) -> bool: 387 """ 388 Return the `autotime` parameter for the pipe. 389 """ 390 return self.parameters.get('autotime', False)
Return the autotime
parameter for the pipe.
336@property 337def upsert(self) -> bool: 338 """ 339 Return whether `upsert` is set for the pipe. 340 """ 341 return self.parameters.get('upsert', False)
Return whether upsert
is set for the pipe.
352@property 353def static(self) -> bool: 354 """ 355 Return whether `static` is set for the pipe. 356 """ 357 return self.parameters.get('static', False)
Return whether static
is set for the pipe.
401@property 402def tzinfo(self) -> Union[None, timezone]: 403 """ 404 Return `timezone.utc` if the pipe is timezone-aware. 405 """ 406 _tzinfo = self._get_cached_value('tzinfo', debug=self.debug) 407 if _tzinfo is not None: 408 return _tzinfo if _tzinfo != 'None' else None 409 410 _tzinfo = None 411 dt_col = self.columns.get('datetime', None) 412 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None 413 if self.autotime: 414 ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 415 ts_typ = self.dtypes.get(ts_col, 'datetime') 416 dt_typ = ts_typ 417 418 if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime': 419 _tzinfo = timezone.utc 420 421 self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug) 422 return _tzinfo
Return timezone.utc
if the pipe is timezone-aware.
425@property 426def enforce(self) -> bool: 427 """ 428 Return the `enforce` parameter for the pipe. 429 """ 430 return self.parameters.get('enforce', True)
Return the enforce
parameter for the pipe.
441@property 442def null_indices(self) -> bool: 443 """ 444 Return the `null_indices` parameter for the pipe. 445 """ 446 return self.parameters.get('null_indices', True)
Return the null_indices
parameter for the pipe.
457@property 458def mixed_numerics(self) -> bool: 459 """ 460 Return the `mixed_numerics` parameter for the pipe. 461 """ 462 return self.parameters.get('mixed_numerics', True)
Return the mixed_numerics
parameter for the pipe.
473def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]: 474 """ 475 Check if the requested columns are defined. 476 477 Parameters 478 ---------- 479 *args: str 480 The column names to be retrieved. 481 482 error: bool, default False 483 If `True`, raise an `Exception` if the specified column is not defined. 484 485 Returns 486 ------- 487 A tuple of the same size of `args` or a `str` if `args` is a single argument. 488 489 Examples 490 -------- 491 >>> pipe = mrsm.Pipe('test', 'test') 492 >>> pipe.columns = {'datetime': 'dt', 'id': 'id'} 493 >>> pipe.get_columns('datetime', 'id') 494 ('dt', 'id') 495 >>> pipe.get_columns('value', error=True) 496 Exception: 🛑 Missing 'value' column for Pipe('test', 'test'). 497 """ 498 from meerschaum.utils.warnings import error as _error 499 if not args: 500 args = tuple(self.columns.keys()) 501 col_names = [] 502 for col in args: 503 col_name = None 504 try: 505 col_name = self.columns[col] 506 if col_name is None and error: 507 _error(f"Please define the name of the '{col}' column for {self}.") 508 except Exception: 509 col_name = None 510 if col_name is None and error: 511 _error(f"Missing '{col}'" + f" column for {self}.") 512 col_names.append(col_name) 513 if len(col_names) == 1: 514 return col_names[0] 515 return tuple(col_names)
Check if the requested columns are defined.
Parameters
- *args (str): The column names to be retrieved.
- error (bool, default False):
If
True
, raise anException
if the specified column is not defined.
Returns
- A tuple of the same size of
args
or astr
ifargs
is a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception: 🛑 Missing 'value' column for Pipe('test', 'test').
518def get_columns_types( 519 self, 520 refresh: bool = False, 521 debug: bool = False, 522) -> Union[Dict[str, str], None]: 523 """ 524 Get a dictionary of a pipe's column names and their types. 525 526 Parameters 527 ---------- 528 refresh: bool, default False 529 If `True`, invalidate the cache and fetch directly from the instance connector. 530 531 debug: bool, default False: 532 Verbosity toggle. 533 534 Returns 535 ------- 536 A dictionary of column names (`str`) to column types (`str`). 537 538 Examples 539 -------- 540 >>> pipe.get_columns_types() 541 { 542 'dt': 'TIMESTAMP WITH TIMEZONE', 543 'id': 'BIGINT', 544 'val': 'DOUBLE PRECISION', 545 } 546 >>> 547 """ 548 from meerschaum.connectors import get_connector_plugin 549 from meerschaum.utils.dtypes import get_current_timestamp 550 551 now = get_current_timestamp('ms', as_int=True) / 1000 552 cache_seconds = ( 553 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 554 if self.static 555 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 556 ) 557 if refresh: 558 self._clear_cache_key('_columns_types_timestamp', debug=debug) 559 self._clear_cache_key('_columns_types', debug=debug) 560 561 _columns_types = self._get_cached_value('_columns_types', debug=debug) 562 if _columns_types: 563 columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug) 564 if columns_types_timestamp is not None: 565 delta = now - columns_types_timestamp 566 if delta < cache_seconds: 567 if debug: 568 dprint( 569 f"Returning cached `columns_types` for {self} " 570 f"({round(delta, 2)} seconds old)." 571 ) 572 return _columns_types 573 574 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 575 _columns_types = ( 576 self.instance_connector.get_pipe_columns_types(self, debug=debug) 577 if hasattr(self.instance_connector, 'get_pipe_columns_types') 578 else None 579 ) 580 581 self._cache_value('_columns_types', _columns_types, debug=debug) 582 self._cache_value('_columns_types_timestamp', now, debug=debug) 583 return _columns_types or {}
Get a dictionary of a pipe's column names and their types.
Parameters
- refresh (bool, default False):
If
True
, invalidate the cache and fetch directly from the instance connector. - debug (bool, default False:): Verbosity toggle.
Returns
- A dictionary of column names (
str
) to column types (str
).
Examples
>>> pipe.get_columns_types()
{
'dt': 'TIMESTAMP WITH TIMEZONE',
'id': 'BIGINT',
'val': 'DOUBLE PRECISION',
}
>>>
586def get_columns_indices( 587 self, 588 debug: bool = False, 589 refresh: bool = False, 590) -> Dict[str, List[Dict[str, str]]]: 591 """ 592 Return a dictionary mapping columns to index information. 593 """ 594 from meerschaum.connectors import get_connector_plugin 595 from meerschaum.utils.dtypes import get_current_timestamp 596 597 now = get_current_timestamp('ms', as_int=True) / 1000 598 cache_seconds = ( 599 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 600 if self.static 601 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 602 ) 603 if refresh: 604 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 605 self._clear_cache_key('_columns_indices', debug=debug) 606 607 _columns_indices = self._get_cached_value('_columns_indices', debug=debug) 608 609 if _columns_indices: 610 columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug) 611 if columns_indices_timestamp is not None: 612 delta = now - columns_indices_timestamp 613 if delta < cache_seconds: 614 if debug: 615 dprint( 616 f"Returning cached `columns_indices` for {self} " 617 f"({round(delta, 2)} seconds old)." 618 ) 619 return _columns_indices 620 621 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 622 _columns_indices = ( 623 self.instance_connector.get_pipe_columns_indices(self, debug=debug) 624 if hasattr(self.instance_connector, 'get_pipe_columns_indices') 625 else None 626 ) 627 628 self._cache_value('_columns_indices', _columns_indices, debug=debug) 629 self._cache_value('_columns_indices_timestamp', now, debug=debug) 630 return {k: v for k, v in _columns_indices.items() if k and v} or {}
Return a dictionary mapping columns to index information.
886def get_indices(self) -> Dict[str, str]: 887 """ 888 Return a dictionary mapping index keys to their names in the database. 889 890 Returns 891 ------- 892 A dictionary of index keys to index names. 893 """ 894 from meerschaum.connectors import get_connector_plugin 895 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 896 if hasattr(self.instance_connector, 'get_pipe_index_names'): 897 result = self.instance_connector.get_pipe_index_names(self) 898 else: 899 result = {} 900 901 return result
Return a dictionary mapping index keys to their names in the database.
Returns
- A dictionary of index keys to index names.
57def get_parameters( 58 self, 59 apply_symlinks: bool = True, 60 refresh: bool = False, 61 debug: bool = False, 62 _visited: 'Optional[set[mrsm.Pipe]]' = None, 63) -> Dict[str, Any]: 64 """ 65 Return the `parameters` dictionary of the pipe. 66 67 Parameters 68 ---------- 69 apply_symlinks: bool, default True 70 If `True`, resolve references to parameters from other pipes. 71 72 refresh: bool, default False 73 If `True`, pull the latest attributes for the pipe. 74 75 Returns 76 ------- 77 The pipe's parameters dictionary. 78 """ 79 from meerschaum.config._patch import apply_patch_to_config 80 from meerschaum.config._read_config import search_and_substitute_config 81 82 if _visited is None: 83 _visited = {self} 84 85 if refresh: 86 _ = self._invalidate_cache(hard=True) 87 88 raw_parameters = self.attributes.get('parameters', {}) 89 ref_keys = raw_parameters.get('reference') 90 if not apply_symlinks: 91 return raw_parameters 92 93 if ref_keys: 94 try: 95 if debug: 96 dprint(f"Building reference pipe from keys: {ref_keys}") 97 ref_pipe = mrsm.Pipe(**ref_keys) 98 if ref_pipe in _visited: 99 warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.") 100 return search_and_substitute_config(raw_parameters) 101 102 _visited.add(ref_pipe) 103 base_params = ref_pipe.get_parameters(_visited=_visited, debug=debug) 104 except Exception as e: 105 warn(f"Failed to resolve reference pipe for {self}: {e}") 106 base_params = {} 107 108 params_to_apply = {k: v for k, v in raw_parameters.items() if k != 'reference'} 109 parameters = apply_patch_to_config(base_params, params_to_apply) 110 else: 111 parameters = raw_parameters 112 113 from meerschaum.utils.pipes import replace_pipes_syntax 114 self._symlinks = {} 115 116 def recursive_replace(obj: Any, path: tuple) -> Any: 117 if isinstance(obj, dict): 118 return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()} 119 if isinstance(obj, list): 120 return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)] 121 if isinstance(obj, str): 122 substituted_val = replace_pipes_syntax(obj) 123 if substituted_val != obj: 124 self._symlinks[path] = { 125 'original': obj, 126 'substituted': substituted_val, 127 } 128 return substituted_val 129 return obj 130 131 return search_and_substitute_config(recursive_replace(parameters, tuple()))
Return the parameters
dictionary of the pipe.
Parameters
- apply_symlinks (bool, default True):
If
True
, resolve references to parameters from other pipes. - refresh (bool, default False):
If
True
, pull the latest attributes for the pipe.
Returns
- The pipe's parameters dictionary.
284def get_dtypes( 285 self, 286 infer: bool = True, 287 refresh: bool = False, 288 debug: bool = False, 289) -> Dict[str, Any]: 290 """ 291 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 292 293 Parameters 294 ---------- 295 infer: bool, default True 296 If `True`, include the implicit existing dtypes. 297 Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`). 298 299 refresh: bool, default False 300 If `True`, invalidate any cache and return the latest known dtypes. 301 302 Returns 303 ------- 304 A dictionary mapping column names to dtypes. 305 """ 306 from meerschaum.config._patch import apply_patch_to_config 307 from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES 308 parameters = self.get_parameters(refresh=refresh, debug=debug) 309 configured_dtypes = parameters.get('dtypes', {}) 310 if debug: 311 dprint(f"Configured dtypes for {self}:") 312 mrsm.pprint(configured_dtypes) 313 314 remote_dtypes = ( 315 self.infer_dtypes(persist=False, refresh=refresh, debug=debug) 316 if infer 317 else {} 318 ) 319 patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {})) 320 321 dt_col = parameters.get('columns', {}).get('datetime', None) 322 primary_col = parameters.get('columns', {}).get('primary', None) 323 _dtypes = { 324 col: MRSM_ALIAS_DTYPES.get(typ, typ) 325 for col, typ in patched_dtypes.items() 326 if col and typ 327 } 328 if dt_col and dt_col not in configured_dtypes: 329 _dtypes[dt_col] = 'datetime' 330 if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes: 331 _dtypes[primary_col] = 'int' 332 333 return _dtypes
If defined, return the dtypes
dictionary defined in meerschaum.Pipe.parameters
.
Parameters
- infer (bool, default True):
If
True
, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g.Pipe.parameters['dtypes']
). - refresh (bool, default False):
If
True
, invalidate any cache and return the latest known dtypes.
Returns
- A dictionary mapping column names to dtypes.
904def update_parameters( 905 self, 906 parameters_patch: Dict[str, Any], 907 persist: bool = True, 908 debug: bool = False, 909) -> mrsm.SuccessTuple: 910 """ 911 Apply a patch to a pipe's `parameters` dictionary. 912 913 Parameters 914 ---------- 915 parameters_patch: Dict[str, Any] 916 The patch to be applied to `Pipe.parameters`. 917 918 persist: bool, default True 919 If `True`, call `Pipe.edit()` to persist the new parameters. 920 """ 921 from meerschaum.config import apply_patch_to_config 922 if 'parameters' not in self._attributes: 923 self._attributes['parameters'] = {} 924 925 self._attributes['parameters'] = apply_patch_to_config( 926 self._attributes['parameters'], 927 parameters_patch, 928 ) 929 930 if self.temporary: 931 persist = False 932 933 if not persist: 934 return True, "Success" 935 936 return self.edit(debug=debug)
Apply a patch to a pipe's parameters
dictionary.
Parameters
- parameters_patch (Dict[str, Any]):
The patch to be applied to
Pipe.parameters
. - persist (bool, default True):
If
True
, callPipe.edit()
to persist the new parameters.
633def get_id(self, **kw: Any) -> Union[int, str, None]: 634 """ 635 Fetch a pipe's ID from its instance connector. 636 If the pipe is not registered, return `None`. 637 """ 638 if self.temporary: 639 return None 640 641 from meerschaum.utils.venv import Venv 642 from meerschaum.connectors import get_connector_plugin 643 644 with Venv(get_connector_plugin(self.instance_connector)): 645 if hasattr(self.instance_connector, 'get_pipe_id'): 646 return self.instance_connector.get_pipe_id(self, **kw) 647 648 return None
Fetch a pipe's ID from its instance connector.
If the pipe is not registered, return None
.
651@property 652def id(self) -> Union[int, str, uuid.UUID, None]: 653 """ 654 Fetch and cache a pipe's ID. 655 """ 656 _id = self._get_cached_value('_id', debug=self.debug) 657 if not _id: 658 _id = self.get_id(debug=self.debug) 659 if _id is not None: 660 self._cache_value('_id', _id, debug=self.debug) 661 return _id
Fetch and cache a pipe's ID.
664def get_val_column(self, debug: bool = False) -> Union[str, None]: 665 """ 666 Return the name of the value column if it's defined, otherwise make an educated guess. 667 If not set in the `columns` dictionary, return the first numeric column that is not 668 an ID or datetime column. 669 If none may be found, return `None`. 670 671 Parameters 672 ---------- 673 debug: bool, default False: 674 Verbosity toggle. 675 676 Returns 677 ------- 678 Either a string or `None`. 679 """ 680 if debug: 681 dprint('Attempting to determine the value column...') 682 try: 683 val_name = self.get_columns('value') 684 except Exception: 685 val_name = None 686 if val_name is not None: 687 if debug: 688 dprint(f"Value column: {val_name}") 689 return val_name 690 691 cols = self.columns 692 if cols is None: 693 if debug: 694 dprint('No columns could be determined. Returning...') 695 return None 696 try: 697 dt_name = self.get_columns('datetime', error=False) 698 except Exception: 699 dt_name = None 700 try: 701 id_name = self.get_columns('id', errors=False) 702 except Exception: 703 id_name = None 704 705 if debug: 706 dprint(f"dt_name: {dt_name}") 707 dprint(f"id_name: {id_name}") 708 709 cols_types = self.get_columns_types(debug=debug) 710 if cols_types is None: 711 return None 712 if debug: 713 dprint(f"cols_types: {cols_types}") 714 if dt_name is not None: 715 cols_types.pop(dt_name, None) 716 if id_name is not None: 717 cols_types.pop(id_name, None) 718 719 candidates = [] 720 candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',} 721 for search_term in candidate_keywords: 722 for col, typ in cols_types.items(): 723 if search_term in typ.lower(): 724 candidates.append(col) 725 break 726 if not candidates: 727 if debug: 728 dprint("No value column could be determined.") 729 return None 730 731 return candidates[0]
Return the name of the value column if it's defined, otherwise make an educated guess.
If not set in the columns
dictionary, return the first numeric column that is not
an ID or datetime column.
If none may be found, return None
.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- Either a string or
None
.
734@property 735def parents(self) -> List[mrsm.Pipe]: 736 """ 737 Return a list of `meerschaum.Pipe` objects to be designated as parents. 738 """ 739 if 'parents' not in self.parameters: 740 return [] 741 742 from meerschaum.utils.warnings import warn 743 _parents_keys = self.parameters['parents'] 744 if not isinstance(_parents_keys, list): 745 warn( 746 f"Please ensure the parents for {self} are defined as a list of keys.", 747 stacklevel = 4 748 ) 749 return [] 750 from meerschaum import Pipe 751 _parents = [] 752 for keys in _parents_keys: 753 try: 754 p = Pipe(**keys) 755 except Exception as e: 756 warn(f"Unable to build parent with keys '{keys}' for {self}:\n{e}") 757 continue 758 _parents.append(p) 759 return _parents
Return a list of meerschaum.Pipe
objects to be designated as parents.
762@property 763def parent(self) -> Union[mrsm.Pipe, None]: 764 """ 765 Return the first pipe in `self.parents` or `None`. 766 """ 767 parents = self.parents 768 if not parents: 769 return None 770 return parents[0]
Return the first pipe in self.parents
or None
.
773@property 774def children(self) -> List[mrsm.Pipe]: 775 """ 776 Return a list of `meerschaum.Pipe` objects to be designated as children. 777 """ 778 if 'children' not in self.parameters: 779 return [] 780 781 from meerschaum.utils.warnings import warn 782 _children_keys = self.parameters['children'] 783 if not isinstance(_children_keys, list): 784 warn( 785 f"Please ensure the children for {self} are defined as a list of keys.", 786 stacklevel = 4 787 ) 788 return [] 789 from meerschaum import Pipe 790 _children = [] 791 for keys in _children_keys: 792 try: 793 p = Pipe(**keys) 794 except Exception as e: 795 warn(f"Unable to build parent with keys '{keys}' for {self}:\n{e}") 796 continue 797 _children.append(p) 798 return _children
Return a list of meerschaum.Pipe
objects to be designated as children.
801@property 802def target(self) -> str: 803 """ 804 The target table name. 805 You can set the target name under on of the following keys 806 (checked in this order): 807 - `target` 808 - `target_name` 809 - `target_table` 810 - `target_table_name` 811 """ 812 if 'target' not in self.parameters: 813 default_target = self._target_legacy() 814 default_targets = {default_target} 815 potential_keys = ('target_name', 'target_table', 'target_table_name') 816 _target = None 817 for k in potential_keys: 818 if k in self.parameters: 819 _target = self.parameters[k] 820 break 821 822 _target = _target or default_target 823 824 if self.instance_connector.type == 'sql': 825 from meerschaum.utils.sql import truncate_item_name 826 truncated_target = truncate_item_name(_target, self.instance_connector.flavor) 827 default_targets.add(truncated_target) 828 warned_target = self.__dict__.get('_warned_target', False) 829 if truncated_target != _target and not warned_target: 830 if not warned_target: 831 warn( 832 f"The target '{_target}' is too long for '{self.instance_connector.flavor}', " 833 + f"will use {truncated_target} instead." 834 ) 835 self.__dict__['_warned_target'] = True 836 _target = truncated_target 837 838 if _target in default_targets: 839 return _target 840 self.target = _target 841 return self.parameters['target']
The target table name. You can set the target name under on of the following keys (checked in this order):
target
target_name
target_table
target_table_name
864def guess_datetime(self) -> Union[str, None]: 865 """ 866 Try to determine a pipe's datetime column. 867 """ 868 _dtypes = self.dtypes 869 870 ### Abort if the user explictly disallows a datetime index. 871 if 'datetime' in _dtypes: 872 if _dtypes['datetime'] is None: 873 return None 874 875 from meerschaum.utils.dtypes import are_dtypes_equal 876 dt_cols = [ 877 col 878 for col, typ in _dtypes.items() 879 if are_dtypes_equal(typ, 'datetime') 880 ] 881 if not dt_cols: 882 return None 883 return dt_cols[0]
Try to determine a pipe's datetime column.
1028@property 1029def precision(self) -> Dict[str, Union[str, int]]: 1030 """ 1031 Return the configured or detected precision. 1032 """ 1033 return self.get_precision(debug=self.debug)
Return the configured or detected precision.
939def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]: 940 """ 941 Return the timestamp precision unit and interval for the `datetime` axis. 942 """ 943 from meerschaum.utils.dtypes import ( 944 MRSM_PRECISION_UNITS_SCALARS, 945 MRSM_PRECISION_UNITS_ALIASES, 946 MRSM_PD_DTYPES, 947 are_dtypes_equal, 948 ) 949 from meerschaum._internal.static import STATIC_CONFIG 950 951 _precision = self._get_cached_value('precision', debug=debug) 952 if _precision: 953 if debug: 954 dprint(f"Returning cached precision: {_precision}") 955 return _precision 956 957 parameters = self.parameters 958 _precision = parameters.get('precision', {}) 959 if isinstance(_precision, str): 960 _precision = {'unit': _precision} 961 default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 962 963 if not _precision: 964 965 dt_col = parameters.get('columns', {}).get('datetime', None) 966 if not dt_col and self.autotime: 967 dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 968 if not dt_col: 969 if debug: 970 dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.") 971 return {'unit': default_precision_unit} 972 973 dt_typ = self.dtypes.get(dt_col, 'datetime') 974 if are_dtypes_equal(dt_typ, 'datetime'): 975 if dt_typ == 'datetime': 976 dt_typ = MRSM_PD_DTYPES['datetime'] 977 if debug: 978 dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.") 979 980 _precision = { 981 'unit': ( 982 dt_typ 983 .split('[', maxsplit=1)[-1] 984 .split(',', maxsplit=1)[0] 985 .split(' ', maxsplit=1)[0] 986 ).rstrip(']') 987 } 988 989 if debug: 990 dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.") 991 992 elif are_dtypes_equal(dt_typ, 'int'): 993 _precision = { 994 'unit': ( 995 'second' 996 if '32' in dt_typ 997 else default_precision_unit 998 ) 999 } 1000 elif are_dtypes_equal(dt_typ, 'date'): 1001 if debug: 1002 dprint("Datetime axis is 'date', falling back to 'day' precision.") 1003 _precision = {'unit': 'day'} 1004 1005 precision_unit = _precision.get('unit', default_precision_unit) 1006 precision_interval = _precision.get('interval', None) 1007 true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 1008 if true_precision_unit is None: 1009 if debug: 1010 dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.") 1011 true_precision_unit = default_precision_unit 1012 1013 if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS: 1014 from meerschaum.utils.misc import items_str 1015 raise ValueError( 1016 f"Invalid precision unit '{true_precision_unit}'.\n" 1017 "Accepted values are " 1018 f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}." 1019 ) 1020 1021 _precision = {'unit': true_precision_unit} 1022 if precision_interval: 1023 _precision['interval'] = precision_interval 1024 self._cache_value('precision', _precision, debug=debug) 1025 return self._precision
Return the timestamp precision unit and interval for the datetime
axis.
12def show( 13 self, 14 nopretty: bool = False, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Show attributes of a Pipe. 20 21 Parameters 22 ---------- 23 nopretty: bool, default False 24 If `True`, simply print the JSON of the pipe's attributes. 25 26 debug: bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success, message. 32 33 """ 34 import json 35 from meerschaum.utils.formatting import ( 36 pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console, 37 ) 38 from meerschaum.utils.packages import import_rich, attempt_import 39 from meerschaum.utils.warnings import info 40 attributes_json = json.dumps(self.attributes) 41 if not nopretty: 42 _to_print = f"Attributes for {self}:" 43 if ANSI: 44 _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta') 45 print(_to_print) 46 rich = import_rich() 47 rich_json = attempt_import('rich.json') 48 get_console().print(rich_json.JSON(attributes_json)) 49 else: 50 print(_to_print) 51 else: 52 print(attributes_json) 53 54 return True, "Success"
Show attributes of a Pipe.
Parameters
- nopretty (bool, default False):
If
True
, simply print the JSON of the pipe's attributes. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTuple
of success, message.
21def edit( 22 self, 23 patch: bool = False, 24 interactive: bool = False, 25 debug: bool = False, 26 **kw: Any 27) -> SuccessTuple: 28 """ 29 Edit a Pipe's configuration. 30 31 Parameters 32 ---------- 33 patch: bool, default False 34 If `patch` is True, update parameters by cascading rather than overwriting. 35 interactive: bool, default False 36 If `True`, open an editor for the user to make changes to the pipe's YAML file. 37 debug: bool, default False 38 Verbosity toggle. 39 40 Returns 41 ------- 42 A `SuccessTuple` of success, message. 43 44 """ 45 from meerschaum.utils.venv import Venv 46 from meerschaum.connectors import get_connector_plugin 47 48 if self.temporary: 49 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 50 51 self._invalidate_cache(hard=True, debug=debug) 52 53 if hasattr(self, '_symlinks'): 54 from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path 55 for path, vals in self._symlinks.items(): 56 current_val = get_val_from_dict_path(self.parameters, path) 57 if current_val == vals['substituted']: 58 set_val_in_dict_path(self.parameters, path, vals['original']) 59 60 if not interactive: 61 with Venv(get_connector_plugin(self.instance_connector)): 62 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw) 63 64 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 65 from meerschaum.utils.misc import edit_file 66 parameters_filename = str(self) + '.yaml' 67 parameters_path = PIPES_CACHE_RESOURCES_PATH / parameters_filename 68 69 from meerschaum.utils.yaml import yaml 70 71 edit_text = f"Edit the parameters for {self}" 72 edit_top = '#' * (len(edit_text) + 4) 73 edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n' 74 75 from meerschaum.config import get_config 76 parameters = dict(get_config('pipes', 'parameters', patch=True)) 77 from meerschaum.config._patch import apply_patch_to_config 78 raw_parameters = self.attributes.get('parameters', {}) 79 parameters = apply_patch_to_config(parameters, raw_parameters) 80 81 ### write parameters to yaml file 82 with open(parameters_path, 'w+') as f: 83 f.write(edit_header) 84 yaml.dump(parameters, stream=f, sort_keys=False) 85 86 ### only quit editing if yaml is valid 87 editing = True 88 while editing: 89 edit_file(parameters_path) 90 try: 91 with open(parameters_path, 'r') as f: 92 file_parameters = yaml.load(f.read()) 93 except Exception as e: 94 from meerschaum.utils.warnings import warn 95 warn(f"Invalid format defined for '{self}':\n\n{e}") 96 input(f"Press [Enter] to correct the configuration for '{self}': ") 97 else: 98 editing = False 99 100 self.parameters = file_parameters 101 102 if debug: 103 from meerschaum.utils.formatting import pprint 104 pprint(self.parameters) 105 106 with Venv(get_connector_plugin(self.instance_connector)): 107 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
Edit a Pipe's configuration.
Parameters
- patch (bool, default False):
If
patch
is True, update parameters by cascading rather than overwriting. - interactive (bool, default False):
If
True
, open an editor for the user to make changes to the pipe's YAML file. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTuple
of success, message.
110def edit_definition( 111 self, 112 yes: bool = False, 113 noask: bool = False, 114 force: bool = False, 115 debug : bool = False, 116 **kw : Any 117) -> SuccessTuple: 118 """ 119 Edit a pipe's definition file and update its configuration. 120 **NOTE:** This function is interactive and should not be used in automated scripts! 121 122 Returns 123 ------- 124 A `SuccessTuple` of success, message. 125 126 """ 127 if self.temporary: 128 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 129 130 from meerschaum.connectors import instance_types 131 if (self.connector is None) or self.connector.type not in instance_types: 132 return self.edit(interactive=True, debug=debug, **kw) 133 134 import json 135 from meerschaum.utils.warnings import info, warn 136 from meerschaum.utils.debug import dprint 137 from meerschaum.config._patch import apply_patch_to_config 138 from meerschaum.utils.misc import edit_file 139 140 _parameters = self.parameters 141 if 'fetch' not in _parameters: 142 _parameters['fetch'] = {} 143 144 def _edit_api(): 145 from meerschaum.utils.prompt import prompt, yes_no 146 info( 147 f"Please enter the keys of the source pipe from '{self.connector}'.\n" + 148 "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip." 149 ) 150 151 _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None } 152 for k in _keys: 153 _keys[k] = _parameters['fetch'].get(k, None) 154 155 for k, v in _keys.items(): 156 try: 157 _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v) 158 except KeyboardInterrupt: 159 continue 160 if _keys[k] in ('', 'None', '\'None\'', '[None]'): 161 _keys[k] = None 162 163 _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys) 164 165 info("You may optionally specify additional filter parameters as JSON.") 166 print(" Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.") 167 print(" For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':") 168 print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': '))) 169 if force or yes_no( 170 "Would you like to add additional filter parameters?", 171 yes=yes, noask=noask 172 ): 173 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 174 definition_filename = str(self) + '.json' 175 definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename 176 try: 177 definition_path.touch() 178 with open(definition_path, 'w+') as f: 179 json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2) 180 except Exception as e: 181 return False, f"Failed writing file '{definition_path}':\n" + str(e) 182 183 _params = None 184 while True: 185 edit_file(definition_path) 186 try: 187 with open(definition_path, 'r') as f: 188 _params = json.load(f) 189 except Exception as e: 190 warn(f'Failed to read parameters JSON:\n{e}', stack=False) 191 if force or yes_no( 192 "Would you like to try again?\n " 193 + "If not, the parameters JSON file will be ignored.", 194 noask=noask, yes=yes 195 ): 196 continue 197 _params = None 198 break 199 if _params is not None: 200 if 'fetch' not in _parameters: 201 _parameters['fetch'] = {} 202 _parameters['fetch']['params'] = _params 203 204 self.parameters = _parameters 205 return True, "Success" 206 207 def _edit_sql(): 208 import textwrap 209 from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH 210 from meerschaum.utils.misc import edit_file 211 definition_filename = str(self) + '.sql' 212 definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename 213 214 sql_definition = _parameters['fetch'].get('definition', None) 215 if sql_definition is None: 216 sql_definition = '' 217 sql_definition = textwrap.dedent(sql_definition).lstrip() 218 219 try: 220 definition_path.touch() 221 with open(definition_path, 'w+') as f: 222 f.write(sql_definition) 223 except Exception as e: 224 return False, f"Failed writing file '{definition_path}':\n" + str(e) 225 226 edit_file(definition_path) 227 try: 228 with open(definition_path, 'r', encoding='utf-8') as f: 229 file_definition = f.read() 230 except Exception as e: 231 return False, f"Failed reading file '{definition_path}':\n" + str(e) 232 233 if sql_definition == file_definition: 234 return False, f"No changes made to definition for {self}." 235 236 if ' ' not in file_definition: 237 return False, f"Invalid SQL definition for {self}." 238 239 if debug: 240 dprint("Read SQL definition:\n\n" + file_definition) 241 _parameters['fetch']['definition'] = file_definition 242 self.parameters = _parameters 243 return True, "Success" 244 245 locals()['_edit_' + str(self.connector.type)]() 246 return self.edit(interactive=False, debug=debug, **kw)
Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!
Returns
- A
SuccessTuple
of success, message.
13def update(self, *args, **kw) -> SuccessTuple: 14 """ 15 Update a pipe's parameters in its instance. 16 """ 17 kw['interactive'] = False 18 return self.edit(*args, **kw)
Update a pipe's parameters in its instance.
41def sync( 42 self, 43 df: Union[ 44 pd.DataFrame, 45 Dict[str, List[Any]], 46 List[Dict[str, Any]], 47 str, 48 InferFetch 49 ] = InferFetch, 50 begin: Union[datetime, int, str, None] = '', 51 end: Union[datetime, int, None] = None, 52 force: bool = False, 53 retries: int = 10, 54 min_seconds: int = 1, 55 check_existing: bool = True, 56 enforce_dtypes: bool = True, 57 blocking: bool = True, 58 workers: Optional[int] = None, 59 callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, 60 error_callback: Optional[Callable[[Exception], Any]] = None, 61 chunksize: Optional[int] = -1, 62 sync_chunks: bool = True, 63 debug: bool = False, 64 _inplace: bool = True, 65 **kw: Any 66) -> SuccessTuple: 67 """ 68 Fetch new data from the source and update the pipe's table with new data. 69 70 Get new remote data via fetch, get existing data in the same time period, 71 and merge the two, only keeping the unseen data. 72 73 Parameters 74 ---------- 75 df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None 76 An optional DataFrame to sync into the pipe. Defaults to `None`. 77 If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`. 78 79 begin: Union[datetime, int, str, None], default '' 80 Optionally specify the earliest datetime to search for data. 81 82 end: Union[datetime, int, str, None], default None 83 Optionally specify the latest datetime to search for data. 84 85 force: bool, default False 86 If `True`, keep trying to sync untul `retries` attempts. 87 88 retries: int, default 10 89 If `force`, how many attempts to try syncing before declaring failure. 90 91 min_seconds: Union[int, float], default 1 92 If `force`, how many seconds to sleep between retries. Defaults to `1`. 93 94 check_existing: bool, default True 95 If `True`, pull and diff with existing data from the pipe. 96 97 enforce_dtypes: bool, default True 98 If `True`, enforce dtypes on incoming data. 99 Set this to `False` if the incoming rows are expected to be of the correct dtypes. 100 101 blocking: bool, default True 102 If `True`, wait for sync to finish and return its result, otherwise 103 asyncronously sync (oxymoron?) and return success. Defaults to `True`. 104 Only intended for specific scenarios. 105 106 workers: Optional[int], default None 107 If provided and the instance connector is thread-safe 108 (`pipe.instance_connector.IS_THREAD_SAFE is True`), 109 limit concurrent sync to this many threads. 110 111 callback: Optional[Callable[[Tuple[bool, str]], Any]], default None 112 Callback function which expects a SuccessTuple as input. 113 Only applies when `blocking=False`. 114 115 error_callback: Optional[Callable[[Exception], Any]], default None 116 Callback function which expects an Exception as input. 117 Only applies when `blocking=False`. 118 119 chunksize: int, default -1 120 Specify the number of rows to sync per chunk. 121 If `-1`, resort to system configuration (default is `900`). 122 A `chunksize` of `None` will sync all rows in one transaction. 123 124 sync_chunks: bool, default True 125 If possible, sync chunks while fetching them into memory. 126 127 debug: bool, default False 128 Verbosity toggle. Defaults to False. 129 130 Returns 131 ------- 132 A `SuccessTuple` of success (`bool`) and message (`str`). 133 """ 134 from meerschaum.utils.debug import dprint, _checkpoint 135 from meerschaum.utils.formatting import get_console 136 from meerschaum.utils.venv import Venv 137 from meerschaum.connectors import get_connector_plugin 138 from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments 139 from meerschaum.utils.pool import get_pool 140 from meerschaum.config import get_config 141 from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp 142 143 if (callback is not None or error_callback is not None) and blocking: 144 warn("Callback functions are only executed when blocking = False. Ignoring...") 145 146 _checkpoint(_total=2, **kw) 147 148 if chunksize == 0: 149 chunksize = None 150 sync_chunks = False 151 152 begin, end = self.parse_date_bounds(begin, end) 153 kw.update({ 154 'begin': begin, 155 'end': end, 156 'force': force, 157 'retries': retries, 158 'min_seconds': min_seconds, 159 'check_existing': check_existing, 160 'blocking': blocking, 161 'workers': workers, 162 'callback': callback, 163 'error_callback': error_callback, 164 'sync_chunks': sync_chunks, 165 'chunksize': chunksize, 166 'safe_copy': True, 167 }) 168 169 self._invalidate_cache(debug=debug) 170 self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug) 171 172 def _sync( 173 p: mrsm.Pipe, 174 df: Union[ 175 'pd.DataFrame', 176 Dict[str, List[Any]], 177 List[Dict[str, Any]], 178 str, 179 InferFetch 180 ] = InferFetch, 181 ) -> SuccessTuple: 182 if df is None: 183 p._invalidate_cache(debug=debug) 184 return ( 185 False, 186 f"You passed `None` instead of data into `sync()` for {p}.\n" 187 + "Omit the DataFrame to infer fetching.", 188 ) 189 ### Ensure that Pipe is registered. 190 if not p.temporary and p.get_id(debug=debug) is None: 191 ### NOTE: This may trigger an interactive session for plugins! 192 register_success, register_msg = p.register(debug=debug) 193 if not register_success: 194 if 'already' not in register_msg: 195 p._invalidate_cache(debug=debug) 196 return register_success, register_msg 197 198 if isinstance(df, str): 199 from meerschaum.utils.dataframe import parse_simple_lines 200 df = parse_simple_lines(df) 201 202 ### If connector is a plugin with a `sync()` method, return that instead. 203 ### If the plugin does not have a `sync()` method but does have a `fetch()` method, 204 ### use that instead. 205 ### NOTE: The DataFrame must be omitted for the plugin sync method to apply. 206 ### If a DataFrame is provided, continue as expected. 207 if hasattr(df, 'MRSM_INFER_FETCH'): 208 try: 209 if p.connector is None: 210 if ':' not in p.connector_keys: 211 return True, f"{p} does not support fetching; nothing to do." 212 213 msg = f"{p} does not have a valid connector." 214 if p.connector_keys.startswith('plugin:'): 215 msg += f"\n Perhaps {p.connector_keys} has a syntax error?" 216 p._invalidate_cache(debug=debug) 217 return False, msg 218 except Exception: 219 p._invalidate_cache(debug=debug) 220 return False, f"Unable to create the connector for {p}." 221 222 ### Sync in place if possible. 223 if ( 224 str(self.connector) == str(self.instance_connector) 225 and 226 hasattr(self.instance_connector, 'sync_pipe_inplace') 227 and 228 _inplace 229 and 230 get_config('system', 'experimental', 'inplace_sync') 231 ): 232 with Venv(get_connector_plugin(self.instance_connector)): 233 p._invalidate_cache(debug=debug) 234 _args, _kwargs = filter_arguments( 235 p.instance_connector.sync_pipe_inplace, 236 p, 237 debug=debug, 238 **kw 239 ) 240 return self.instance_connector.sync_pipe_inplace( 241 *_args, 242 **_kwargs 243 ) 244 245 ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods. 246 try: 247 if getattr(p.connector, 'sync', None) is not None: 248 with Venv(get_connector_plugin(p.connector), debug=debug): 249 _args, _kwargs = filter_arguments( 250 p.connector.sync, 251 p, 252 debug=debug, 253 **kw 254 ) 255 return_tuple = p.connector.sync(*_args, **_kwargs) 256 p._invalidate_cache(debug=debug) 257 if not isinstance(return_tuple, tuple): 258 return_tuple = ( 259 False, 260 f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}" 261 ) 262 return return_tuple 263 264 except Exception as e: 265 get_console().print_exception() 266 msg = f"Failed to sync {p} with exception: '" + str(e) + "'" 267 if debug: 268 error(msg, silent=False) 269 p._invalidate_cache(debug=debug) 270 return False, msg 271 272 ### Fetch the dataframe from the connector's `fetch()` method. 273 try: 274 with Venv(get_connector_plugin(p.connector), debug=debug): 275 df = p.fetch( 276 **filter_keywords( 277 p.fetch, 278 debug=debug, 279 **kw 280 ) 281 ) 282 kw['safe_copy'] = False 283 except Exception as e: 284 get_console().print_exception( 285 suppress=[ 286 'meerschaum/core/Pipe/_sync.py', 287 'meerschaum/core/Pipe/_fetch.py', 288 ] 289 ) 290 msg = f"Failed to fetch data from {p.connector}:\n {e}" 291 df = None 292 293 if df is None: 294 p._invalidate_cache(debug=debug) 295 return False, f"No data were fetched for {p}." 296 297 if isinstance(df, list): 298 if len(df) == 0: 299 return True, f"No new rows were returned for {p}." 300 301 ### May be a chunk hook results list. 302 if isinstance(df[0], tuple): 303 success = all([_success for _success, _ in df]) 304 message = '\n'.join([_message for _, _message in df]) 305 return success, message 306 307 if df is True: 308 p._invalidate_cache(debug=debug) 309 return True, f"{p} is being synced in parallel." 310 311 ### CHECKPOINT: Retrieved the DataFrame. 312 _checkpoint(**kw) 313 314 ### Allow for dataframe generators or iterables. 315 if df_is_chunk_generator(df): 316 kw['workers'] = p.get_num_workers(kw.get('workers', None)) 317 dt_col = p.columns.get('datetime', None) 318 pool = get_pool(workers=kw.get('workers', 1)) 319 if debug: 320 dprint(f"Received {type(df)}. Attempting to sync first chunk...") 321 322 try: 323 chunk = next(df) 324 except StopIteration: 325 return True, "Received an empty generator; nothing to do." 326 327 chunk_success, chunk_msg = _sync(p, chunk) 328 chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg 329 if not chunk_success: 330 return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}" 331 if debug: 332 dprint("Successfully synced the first chunk, attemping the rest...") 333 334 def _process_chunk(_chunk): 335 _chunk_attempts = 0 336 _max_chunk_attempts = 3 337 while _chunk_attempts < _max_chunk_attempts: 338 try: 339 _chunk_success, _chunk_msg = _sync(p, _chunk) 340 except Exception as e: 341 _chunk_success, _chunk_msg = False, str(e) 342 if _chunk_success: 343 break 344 _chunk_attempts += 1 345 _sleep_seconds = _chunk_attempts ** 2 346 warn( 347 ( 348 f"Failed to sync chunk to {self} " 349 + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n" 350 + f"Sleeping for {_sleep_seconds} second" 351 + ('s' if _sleep_seconds != 1 else '') 352 + f":\n{_chunk_msg}" 353 ), 354 stack=False, 355 ) 356 time.sleep(_sleep_seconds) 357 358 num_rows_str = ( 359 f"{num_rows:,} rows" 360 if (num_rows := len(_chunk)) != 1 361 else f"{num_rows} row" 362 ) 363 _chunk_msg = ( 364 ( 365 "Synced" 366 if _chunk_success 367 else "Failed to sync" 368 ) + f" a chunk ({num_rows_str}) to {p}:\n" 369 + self._get_chunk_label(_chunk, dt_col) 370 + '\n' 371 + _chunk_msg 372 ) 373 374 mrsm.pprint((_chunk_success, _chunk_msg), calm=True) 375 return _chunk_success, _chunk_msg 376 377 results = sorted( 378 [(chunk_success, chunk_msg)] + ( 379 list(pool.imap(_process_chunk, df)) 380 if ( 381 not df_is_chunk_generator(chunk) # Handle nested generators. 382 and kw.get('workers', 1) != 1 383 ) 384 else list( 385 _process_chunk(_child_chunks) 386 for _child_chunks in df 387 ) 388 ) 389 ) 390 chunk_messages = [chunk_msg for _, chunk_msg in results] 391 success_bools = [chunk_success for chunk_success, _ in results] 392 num_successes = len([chunk_success for chunk_success, _ in results if chunk_success]) 393 num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success]) 394 success = all(success_bools) 395 msg = ( 396 'Synced ' 397 + f'{len(chunk_messages):,} chunk' 398 + ('s' if len(chunk_messages) != 1 else '') 399 + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n' 400 + '\n\n'.join(chunk_messages).lstrip().rstrip() 401 ).lstrip().rstrip() 402 return success, msg 403 404 ### Cast to a dataframe and ensure datatypes are what we expect. 405 dtypes = p.get_dtypes(debug=debug) 406 df = p.enforce_dtypes( 407 df, 408 chunksize=chunksize, 409 enforce=enforce_dtypes, 410 dtypes=dtypes, 411 debug=debug, 412 ) 413 if p.autotime: 414 dt_col = p.columns.get('datetime', None) 415 ts_col = dt_col or mrsm.get_config( 416 'pipes', 'autotime', 'column_name_if_datetime_missing' 417 ) 418 ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime' 419 if ts_col and hasattr(df, 'columns') and ts_col not in df.columns: 420 precision = p.get_precision(debug=debug) 421 now = get_current_timestamp( 422 precision_unit=precision.get( 423 'unit', 424 STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 425 ), 426 precision_interval=precision.get('interval', 1), 427 round_to=(precision.get('round_to', 'down')), 428 as_int=(are_dtypes_equal(ts_typ, 'int')), 429 ) 430 if debug: 431 dprint(f"Adding current timestamp to dataframe synced to {p}: {now}") 432 433 df[ts_col] = now 434 kw['check_existing'] = dt_col is not None 435 436 ### Capture special columns. 437 capture_success, capture_msg = self._persist_new_special_columns( 438 df, 439 dtypes=dtypes, 440 debug=debug, 441 ) 442 if not capture_success: 443 warn(f"Failed to capture new special columns for {self}:\n{capture_msg}") 444 445 if debug: 446 dprint( 447 "DataFrame to sync:\n" 448 + ( 449 str(df)[:255] 450 + '...' 451 if len(str(df)) >= 256 452 else str(df) 453 ), 454 **kw 455 ) 456 457 ### if force, continue to sync until success 458 return_tuple = False, f"Did not sync {p}." 459 run = True 460 _retries = 1 461 while run: 462 with Venv(get_connector_plugin(self.instance_connector)): 463 return_tuple = p.instance_connector.sync_pipe( 464 pipe=p, 465 df=df, 466 debug=debug, 467 **kw 468 ) 469 _retries += 1 470 run = (not return_tuple[0]) and force and _retries <= retries 471 if run and debug: 472 dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw) 473 dprint(f"Sleeping for {min_seconds} seconds...", **kw) 474 time.sleep(min_seconds) 475 if _retries > retries: 476 warn( 477 f"Unable to sync {p} within {retries} attempt" + 478 ("s" if retries != 1 else "") + "!" 479 ) 480 481 ### CHECKPOINT: Finished syncing. 482 _checkpoint(**kw) 483 p._invalidate_cache(debug=debug) 484 return return_tuple 485 486 if blocking: 487 return _sync(self, df=df) 488 489 from meerschaum.utils.threading import Thread 490 def default_callback(result_tuple: SuccessTuple): 491 dprint(f"Asynchronous result from {self}: {result_tuple}", **kw) 492 493 def default_error_callback(x: Exception): 494 dprint(f"Error received for {self}: {x}", **kw) 495 496 if callback is None and debug: 497 callback = default_callback 498 if error_callback is None and debug: 499 error_callback = default_error_callback 500 try: 501 thread = Thread( 502 target=_sync, 503 args=(self,), 504 kwargs={'df': df}, 505 daemon=False, 506 callback=callback, 507 error_callback=error_callback, 508 ) 509 thread.start() 510 except Exception as e: 511 self._invalidate_cache(debug=debug) 512 return False, str(e) 513 514 self._invalidate_cache(debug=debug) 515 return True, f"Spawned asyncronous sync for {self}."
Fetch new data from the source and update the pipe's table with new data.
Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.
Parameters
- df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None):
An optional DataFrame to sync into the pipe. Defaults to
None
. Ifdf
is a string, it will be parsed viameerschaum.utils.dataframe.parse_simple_lines()
. - begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
- end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
- force (bool, default False):
If
True
, keep trying to sync untulretries
attempts. - retries (int, default 10):
If
force
, how many attempts to try syncing before declaring failure. - min_seconds (Union[int, float], default 1):
If
force
, how many seconds to sleep between retries. Defaults to1
. - check_existing (bool, default True):
If
True
, pull and diff with existing data from the pipe. - enforce_dtypes (bool, default True):
If
True
, enforce dtypes on incoming data. Set this toFalse
if the incoming rows are expected to be of the correct dtypes. - blocking (bool, default True):
If
True
, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults toTrue
. Only intended for specific scenarios. - workers (Optional[int], default None):
If provided and the instance connector is thread-safe
(
pipe.instance_connector.IS_THREAD_SAFE is True
), limit concurrent sync to this many threads. - callback (Optional[Callable[[Tuple[bool, str]], Any]], default None):
Callback function which expects a SuccessTuple as input.
Only applies when
blocking=False
. - error_callback (Optional[Callable[[Exception], Any]], default None):
Callback function which expects an Exception as input.
Only applies when
blocking=False
. - chunksize (int, default -1):
Specify the number of rows to sync per chunk.
If
-1
, resort to system configuration (default is900
). Achunksize
ofNone
will sync all rows in one transaction. - sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
- debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTuple
of success (bool
) and message (str
).
518def get_sync_time( 519 self, 520 params: Optional[Dict[str, Any]] = None, 521 newest: bool = True, 522 apply_backtrack_interval: bool = False, 523 remote: bool = False, 524 round_down: bool = False, 525 debug: bool = False 526) -> Union['datetime', int, None]: 527 """ 528 Get the most recent datetime value for a Pipe. 529 530 Parameters 531 ---------- 532 params: Optional[Dict[str, Any]], default None 533 Dictionary to build a WHERE clause for a specific column. 534 See `meerschaum.utils.sql.build_where`. 535 536 newest: bool, default True 537 If `True`, get the most recent datetime (honoring `params`). 538 If `False`, get the oldest datetime (`ASC` instead of `DESC`). 539 540 apply_backtrack_interval: bool, default False 541 If `True`, subtract the backtrack interval from the sync time. 542 543 remote: bool, default False 544 If `True` and the instance connector supports it, return the sync time 545 for the remote table definition. 546 547 round_down: bool, default False 548 If `True`, round down the datetime value to the nearest minute. 549 550 debug: bool, default False 551 Verbosity toggle. 552 553 Returns 554 ------- 555 A `datetime` or int, if the pipe exists, otherwise `None`. 556 557 """ 558 from meerschaum.utils.venv import Venv 559 from meerschaum.connectors import get_connector_plugin 560 from meerschaum.utils.misc import filter_keywords 561 from meerschaum.utils.dtypes import round_time 562 from meerschaum.utils.warnings import warn 563 564 if not self.columns.get('datetime', None): 565 return None 566 567 connector = self.instance_connector if not remote else self.connector 568 with Venv(get_connector_plugin(connector)): 569 if not hasattr(connector, 'get_sync_time'): 570 warn( 571 f"Connectors of type '{connector.type}' " 572 "do not implement `get_sync_time().", 573 stack=False, 574 ) 575 return None 576 sync_time = connector.get_sync_time( 577 self, 578 **filter_keywords( 579 connector.get_sync_time, 580 params=params, 581 newest=newest, 582 remote=remote, 583 debug=debug, 584 ) 585 ) 586 587 if round_down and isinstance(sync_time, datetime): 588 sync_time = round_time(sync_time, timedelta(minutes=1)) 589 590 if apply_backtrack_interval and sync_time is not None: 591 backtrack_interval = self.get_backtrack_interval(debug=debug) 592 try: 593 sync_time -= backtrack_interval 594 except Exception as e: 595 warn(f"Failed to apply backtrack interval:\n{e}") 596 597 return self.parse_date_bounds(sync_time)
Get the most recent datetime value for a Pipe.
Parameters
- params (Optional[Dict[str, Any]], default None):
Dictionary to build a WHERE clause for a specific column.
See
meerschaum.utils.sql.build_where
. - newest (bool, default True):
If
True
, get the most recent datetime (honoringparams
). IfFalse
, get the oldest datetime (ASC
instead ofDESC
). - apply_backtrack_interval (bool, default False):
If
True
, subtract the backtrack interval from the sync time. - remote (bool, default False):
If
True
and the instance connector supports it, return the sync time for the remote table definition. - round_down (bool, default False):
If
True
, round down the datetime value to the nearest minute. - debug (bool, default False): Verbosity toggle.
Returns
- A
datetime
or int, if the pipe exists, otherwiseNone
.
600def exists( 601 self, 602 debug: bool = False 603) -> bool: 604 """ 605 See if a Pipe's table exists. 606 607 Parameters 608 ---------- 609 debug: bool, default False 610 Verbosity toggle. 611 612 Returns 613 ------- 614 A `bool` corresponding to whether a pipe's underlying table exists. 615 616 """ 617 from meerschaum.utils.venv import Venv 618 from meerschaum.connectors import get_connector_plugin 619 from meerschaum.utils.debug import dprint 620 from meerschaum.utils.dtypes import get_current_timestamp 621 now = get_current_timestamp('ms', as_int=True) / 1000 622 cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds') 623 624 _exists = self._get_cached_value('_exists', debug=debug) 625 if _exists: 626 exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug) 627 if exists_timestamp is not None: 628 delta = now - exists_timestamp 629 if delta < cache_seconds: 630 if debug: 631 dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).") 632 return _exists 633 634 with Venv(get_connector_plugin(self.instance_connector)): 635 _exists = ( 636 self.instance_connector.pipe_exists(pipe=self, debug=debug) 637 if hasattr(self.instance_connector, 'pipe_exists') 638 else False 639 ) 640 641 self._cache_value('_exists', _exists, debug=debug) 642 self._cache_value('_exists_timestamp', now, debug=debug) 643 return _exists
See if a Pipe's table exists.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
bool
corresponding to whether a pipe's underlying table exists.
646def filter_existing( 647 self, 648 df: 'pd.DataFrame', 649 safe_copy: bool = True, 650 date_bound_only: bool = False, 651 include_unchanged_columns: bool = False, 652 enforce_dtypes: bool = False, 653 chunksize: Optional[int] = -1, 654 debug: bool = False, 655 **kw 656) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']: 657 """ 658 Inspect a dataframe and filter out rows which already exist in the pipe. 659 660 Parameters 661 ---------- 662 df: 'pd.DataFrame' 663 The dataframe to inspect and filter. 664 665 safe_copy: bool, default True 666 If `True`, create a copy before comparing and modifying the dataframes. 667 Setting to `False` may mutate the DataFrames. 668 See `meerschaum.utils.dataframe.filter_unseen_df`. 669 670 date_bound_only: bool, default False 671 If `True`, only use the datetime index to fetch the sample dataframe. 672 673 include_unchanged_columns: bool, default False 674 If `True`, include the backtrack columns which haven't changed in the update dataframe. 675 This is useful if you can't update individual keys. 676 677 enforce_dtypes: bool, default False 678 If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes. 679 Setting `enforce_dtypes=True` may impact performance. 680 681 chunksize: Optional[int], default -1 682 The `chunksize` used when fetching existing data. 683 684 debug: bool, default False 685 Verbosity toggle. 686 687 Returns 688 ------- 689 A tuple of three pandas DataFrames: unseen, update, and delta. 690 """ 691 from meerschaum.utils.warnings import warn 692 from meerschaum.utils.debug import dprint 693 from meerschaum.utils.packages import attempt_import, import_pandas 694 from meerschaum.utils.dataframe import ( 695 filter_unseen_df, 696 add_missing_cols_to_df, 697 get_unhashable_cols, 698 ) 699 from meerschaum.utils.dtypes import ( 700 to_pandas_dtype, 701 none_if_null, 702 to_datetime, 703 are_dtypes_equal, 704 value_is_null, 705 round_time, 706 ) 707 from meerschaum.config import get_config 708 pd = import_pandas() 709 pandas = attempt_import('pandas') 710 if enforce_dtypes or 'dataframe' not in str(type(df)).lower(): 711 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 712 is_dask = hasattr('df', '__module__') and 'dask' in df.__module__ 713 if is_dask: 714 dd = attempt_import('dask.dataframe') 715 merge = dd.merge 716 NA = pandas.NA 717 else: 718 merge = pd.merge 719 NA = pd.NA 720 721 parameters = self.parameters 722 pipe_columns = parameters.get('columns', {}) 723 primary_key = pipe_columns.get('primary', None) 724 dt_col = pipe_columns.get('datetime', None) 725 dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None 726 autoincrement = parameters.get('autoincrement', False) 727 autotime = parameters.get('autotime', False) 728 729 if primary_key and autoincrement and df is not None and primary_key in df.columns: 730 if safe_copy: 731 df = df.copy() 732 safe_copy = False 733 if df[primary_key].isnull().all(): 734 del df[primary_key] 735 _ = self.columns.pop(primary_key, None) 736 737 if dt_col and autotime and df is not None and dt_col in df.columns: 738 if safe_copy: 739 df = df.copy() 740 safe_copy = False 741 if df[dt_col].isnull().all(): 742 del df[dt_col] 743 _ = self.columns.pop(dt_col, None) 744 745 def get_empty_df(): 746 empty_df = pd.DataFrame([]) 747 dtypes = dict(df.dtypes) if df is not None else {} 748 dtypes.update(self.dtypes) if self.enforce else {} 749 pd_dtypes = { 750 col: to_pandas_dtype(str(typ)) 751 for col, typ in dtypes.items() 752 } 753 return add_missing_cols_to_df(empty_df, pd_dtypes) 754 755 if df is None: 756 empty_df = get_empty_df() 757 return empty_df, empty_df, empty_df 758 759 if (df.empty if not is_dask else len(df) == 0): 760 return df, df, df 761 762 ### begin is the oldest data in the new dataframe 763 begin, end = None, None 764 765 if autoincrement and primary_key == dt_col and dt_col not in df.columns: 766 if enforce_dtypes: 767 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 768 return df, get_empty_df(), df 769 770 if autotime and dt_col and dt_col not in df.columns: 771 if enforce_dtypes: 772 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 773 return df, get_empty_df(), df 774 775 try: 776 min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None 777 if is_dask and min_dt_val is not None: 778 min_dt_val = min_dt_val.compute() 779 min_dt = ( 780 to_datetime(min_dt_val, as_pydatetime=True) 781 if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime') 782 else min_dt_val 783 ) 784 except Exception: 785 min_dt = None 786 787 if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt): 788 if not are_dtypes_equal('int', str(type(min_dt))): 789 min_dt = None 790 791 if isinstance(min_dt, datetime): 792 rounded_min_dt = round_time(min_dt, to='down') 793 try: 794 begin = rounded_min_dt - timedelta(minutes=1) 795 except OverflowError: 796 begin = rounded_min_dt 797 elif dt_type and 'int' in dt_type.lower(): 798 begin = min_dt 799 elif dt_col is None: 800 begin = None 801 802 ### end is the newest data in the new dataframe 803 try: 804 max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None 805 if is_dask and max_dt_val is not None: 806 max_dt_val = max_dt_val.compute() 807 max_dt = ( 808 to_datetime(max_dt_val, as_pydatetime=True) 809 if max_dt_val is not None and 'datetime' in str(dt_type) 810 else max_dt_val 811 ) 812 except Exception: 813 import traceback 814 traceback.print_exc() 815 max_dt = None 816 817 if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt): 818 if not are_dtypes_equal('int', str(type(max_dt))): 819 max_dt = None 820 821 if isinstance(max_dt, datetime): 822 end = ( 823 round_time( 824 max_dt, 825 to='down' 826 ) + timedelta(minutes=1) 827 ) 828 elif dt_type and 'int' in dt_type.lower() and max_dt is not None: 829 end = max_dt + 1 830 831 if max_dt is not None and min_dt is not None and min_dt > max_dt: 832 warn("Detected minimum datetime greater than maximum datetime.") 833 834 if begin is not None and end is not None and begin > end: 835 if isinstance(begin, datetime): 836 begin = end - timedelta(minutes=1) 837 ### We might be using integers for the datetime axis. 838 else: 839 begin = end - 1 840 841 unique_index_vals = { 842 col: df[col].unique() 843 for col in (pipe_columns if not primary_key else [primary_key]) 844 if col in df.columns and col != dt_col 845 } if not date_bound_only else {} 846 filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit') 847 _ = kw.pop('params', None) 848 params = { 849 col: [ 850 none_if_null(val) 851 for val in unique_vals 852 ] 853 for col, unique_vals in unique_index_vals.items() 854 if len(unique_vals) <= filter_params_index_limit 855 } if not date_bound_only else {} 856 857 if debug: 858 dprint(f"Looking at data between '{begin}' and '{end}':", **kw) 859 860 backtrack_df = self.get_data( 861 begin=begin, 862 end=end, 863 chunksize=chunksize, 864 params=params, 865 debug=debug, 866 **kw 867 ) 868 if backtrack_df is None: 869 if debug: 870 dprint(f"No backtrack data was found for {self}.") 871 return df, get_empty_df(), df 872 873 if enforce_dtypes: 874 backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug) 875 876 if debug: 877 dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw) 878 dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes)) 879 880 ### Separate new rows from changed ones. 881 on_cols = [ 882 col 883 for col_key, col in pipe_columns.items() 884 if ( 885 col 886 and 887 col_key != 'value' 888 and col in backtrack_df.columns 889 ) 890 ] if not primary_key else [primary_key] 891 892 self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {} 893 on_cols_dtypes = { 894 col: to_pandas_dtype(typ) 895 for col, typ in self_dtypes.items() 896 if col in on_cols 897 } 898 899 ### Detect changes between the old target and new source dataframes. 900 delta_df = add_missing_cols_to_df( 901 filter_unseen_df( 902 backtrack_df, 903 df, 904 dtypes={ 905 col: to_pandas_dtype(typ) 906 for col, typ in self_dtypes.items() 907 }, 908 safe_copy=safe_copy, 909 coerce_mixed_numerics=(not self.static), 910 debug=debug 911 ), 912 on_cols_dtypes, 913 ) 914 if enforce_dtypes: 915 delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug) 916 917 ### Cast dicts or lists to strings so we can merge. 918 serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str) 919 920 def deserializer(x): 921 return json.loads(x) if isinstance(x, str) else x 922 923 unhashable_delta_cols = get_unhashable_cols(delta_df) 924 unhashable_backtrack_cols = get_unhashable_cols(backtrack_df) 925 for col in unhashable_delta_cols: 926 delta_df[col] = delta_df[col].apply(serializer) 927 for col in unhashable_backtrack_cols: 928 backtrack_df[col] = backtrack_df[col].apply(serializer) 929 casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols) 930 931 joined_df = merge( 932 delta_df.infer_objects(copy=False).fillna(NA), 933 backtrack_df.infer_objects(copy=False).fillna(NA), 934 how='left', 935 on=on_cols, 936 indicator=True, 937 suffixes=('', '_old'), 938 ) if on_cols else delta_df 939 for col in casted_cols: 940 if col in joined_df.columns: 941 joined_df[col] = joined_df[col].apply(deserializer) 942 if col in delta_df.columns: 943 delta_df[col] = delta_df[col].apply(deserializer) 944 945 ### Determine which rows are completely new. 946 new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None 947 cols = list(delta_df.columns) 948 949 unseen_df = ( 950 joined_df 951 .where(new_rows_mask) 952 .dropna(how='all')[cols] 953 .reset_index(drop=True) 954 ) if on_cols else delta_df 955 956 ### Rows that have already been inserted but values have changed. 957 update_df = ( 958 joined_df 959 .where(~new_rows_mask) 960 .dropna(how='all')[cols] 961 .reset_index(drop=True) 962 ) if on_cols else get_empty_df() 963 964 if include_unchanged_columns and on_cols: 965 unchanged_backtrack_cols = [ 966 col 967 for col in backtrack_df.columns 968 if col in on_cols or col not in update_df.columns 969 ] 970 if enforce_dtypes: 971 update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug) 972 update_df = merge( 973 backtrack_df[unchanged_backtrack_cols], 974 update_df, 975 how='inner', 976 on=on_cols, 977 ) 978 979 return unseen_df, update_df, delta_df
Inspect a dataframe and filter out rows which already exist in the pipe.
Parameters
- df ('pd.DataFrame'): The dataframe to inspect and filter.
- safe_copy (bool, default True):
If
True
, create a copy before comparing and modifying the dataframes. Setting toFalse
may mutate the DataFrames. Seemeerschaum.utils.dataframe.filter_unseen_df
. - date_bound_only (bool, default False):
If
True
, only use the datetime index to fetch the sample dataframe. - include_unchanged_columns (bool, default False):
If
True
, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys. - enforce_dtypes (bool, default False):
If
True
, ensure the given and intermediate dataframes are enforced to the correct dtypes. Settingenforce_dtypes=True
may impact performance. - chunksize (Optional[int], default -1):
The
chunksize
used when fetching existing data. - debug (bool, default False): Verbosity toggle.
Returns
- A tuple of three pandas DataFrames (unseen, update, and delta.):
1004def get_num_workers(self, workers: Optional[int] = None) -> int: 1005 """ 1006 Get the number of workers to use for concurrent syncs. 1007 1008 Parameters 1009 ---------- 1010 The number of workers passed via `--workers`. 1011 1012 Returns 1013 ------- 1014 The number of workers, capped for safety. 1015 """ 1016 is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False) 1017 if not is_thread_safe: 1018 return 1 1019 1020 engine_pool_size = ( 1021 self.instance_connector.engine.pool.size() 1022 if self.instance_connector.type == 'sql' 1023 else None 1024 ) 1025 current_num_threads = threading.active_count() 1026 current_num_connections = ( 1027 self.instance_connector.engine.pool.checkedout() 1028 if engine_pool_size is not None 1029 else current_num_threads 1030 ) 1031 desired_workers = ( 1032 min(workers or engine_pool_size, engine_pool_size) 1033 if engine_pool_size is not None 1034 else workers 1035 ) 1036 if desired_workers is None: 1037 desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1) 1038 1039 return max( 1040 (desired_workers - current_num_connections), 1041 1, 1042 )
Get the number of workers to use for concurrent syncs.
Parameters
- The number of workers passed via
--workers
.
Returns
- The number of workers, capped for safety.
19def verify( 20 self, 21 begin: Union[datetime, int, None] = None, 22 end: Union[datetime, int, None] = None, 23 params: Optional[Dict[str, Any]] = None, 24 chunk_interval: Union[timedelta, int, None] = None, 25 bounded: Optional[bool] = None, 26 deduplicate: bool = False, 27 workers: Optional[int] = None, 28 batchsize: Optional[int] = None, 29 skip_chunks_with_greater_rowcounts: bool = False, 30 check_rowcounts_only: bool = False, 31 debug: bool = False, 32 **kwargs: Any 33) -> SuccessTuple: 34 """ 35 Verify the contents of the pipe by resyncing its interval. 36 37 Parameters 38 ---------- 39 begin: Union[datetime, int, None], default None 40 If specified, only verify rows greater than or equal to this value. 41 42 end: Union[datetime, int, None], default None 43 If specified, only verify rows less than this value. 44 45 chunk_interval: Union[timedelta, int, None], default None 46 If provided, use this as the size of the chunk boundaries. 47 Default to the value set in `pipe.parameters['chunk_minutes']` (1440). 48 49 bounded: Optional[bool], default None 50 If `True`, do not verify older than the oldest sync time or newer than the newest. 51 If `False`, verify unbounded syncs outside of the new and old sync times. 52 The default behavior (`None`) is to bound only if a bound interval is set 53 (e.g. `pipe.parameters['verify']['bound_days']`). 54 55 deduplicate: bool, default False 56 If `True`, deduplicate the pipe's table after the verification syncs. 57 58 workers: Optional[int], default None 59 If provided, limit the verification to this many threads. 60 Use a value of `1` to sync chunks in series. 61 62 batchsize: Optional[int], default None 63 If provided, sync this many chunks in parallel. 64 Defaults to `Pipe.get_num_workers()`. 65 66 skip_chunks_with_greater_rowcounts: bool, default False 67 If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's 68 chunk rowcount equals or exceeds the remote's rowcount. 69 70 check_rowcounts_only: bool, default False 71 If `True`, only compare rowcounts and print chunks which are out-of-sync. 72 73 debug: bool, default False 74 Verbosity toggle. 75 76 kwargs: Any 77 All keyword arguments are passed to `pipe.sync()`. 78 79 Returns 80 ------- 81 A SuccessTuple indicating whether the pipe was successfully resynced. 82 """ 83 from meerschaum.utils.pool import get_pool 84 from meerschaum.utils.formatting import make_header 85 from meerschaum.utils.misc import interval_str 86 workers = self.get_num_workers(workers) 87 check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only 88 89 ### Skip configured bounding in parameters 90 ### if `bounded` is explicitly `False`. 91 bound_time = ( 92 self.get_bound_time(debug=debug) 93 if bounded is not False 94 else None 95 ) 96 if bounded is None: 97 bounded = bound_time is not None 98 99 if bounded and begin is None: 100 begin = ( 101 bound_time 102 if bound_time is not None 103 else self.get_sync_time(newest=False, debug=debug) 104 ) 105 if begin is None: 106 remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug) 107 begin = remote_oldest_sync_time 108 if bounded and end is None: 109 end = self.get_sync_time(newest=True, debug=debug) 110 if end is None: 111 remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug) 112 end = remote_newest_sync_time 113 if end is not None: 114 end += ( 115 timedelta(minutes=1) 116 if hasattr(end, 'tzinfo') 117 else 1 118 ) 119 120 begin, end = self.parse_date_bounds(begin, end) 121 cannot_determine_bounds = bounded and begin is None and end is None 122 123 if cannot_determine_bounds and not check_rowcounts_only: 124 warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False) 125 sync_success, sync_msg = self.sync( 126 begin=begin, 127 end=end, 128 params=params, 129 workers=workers, 130 debug=debug, 131 **kwargs 132 ) 133 if not sync_success: 134 return sync_success, sync_msg 135 136 if deduplicate: 137 return self.deduplicate( 138 begin=begin, 139 end=end, 140 params=params, 141 workers=workers, 142 debug=debug, 143 **kwargs 144 ) 145 return sync_success, sync_msg 146 147 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 148 chunk_bounds = self.get_chunk_bounds( 149 begin=begin, 150 end=end, 151 chunk_interval=chunk_interval, 152 bounded=bounded, 153 debug=debug, 154 ) 155 156 ### Consider it a success if no chunks need to be verified. 157 if not chunk_bounds: 158 if deduplicate: 159 return self.deduplicate( 160 begin=begin, 161 end=end, 162 params=params, 163 workers=workers, 164 debug=debug, 165 **kwargs 166 ) 167 return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do." 168 169 begin_to_print = ( 170 begin 171 if begin is not None 172 else ( 173 chunk_bounds[0][0] 174 if bounded 175 else chunk_bounds[0][1] 176 ) 177 ) 178 end_to_print = ( 179 end 180 if end is not None 181 else ( 182 chunk_bounds[-1][1] 183 if bounded 184 else chunk_bounds[-1][0] 185 ) 186 ) 187 message_header = f"{begin_to_print} - {end_to_print}" 188 max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs') 189 190 info( 191 f"Verifying {self}:\n " 192 + ("Syncing" if not check_rowcounts_only else "Checking") 193 + f" {len(chunk_bounds)} chunk" 194 + ('s' if len(chunk_bounds) != 1 else '') 195 + f" ({'un' if not bounded else ''}bounded)" 196 + f" of size '{interval_str(chunk_interval)}'" 197 + f" between '{begin_to_print}' and '{end_to_print}'.\n" 198 ) 199 200 ### Dictionary of the form bounds -> success_tuple, e.g.: 201 ### { 202 ### (2023-01-01, 2023-01-02): (True, "Success") 203 ### } 204 bounds_success_tuples = {} 205 def process_chunk_bounds( 206 chunk_begin_and_end: Tuple[ 207 Union[int, datetime], 208 Union[int, datetime] 209 ], 210 _workers: Optional[int] = 1, 211 ): 212 if chunk_begin_and_end in bounds_success_tuples: 213 return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end] 214 215 chunk_begin, chunk_end = chunk_begin_and_end 216 do_sync = True 217 chunk_success, chunk_msg = False, "Did not sync chunk." 218 if check_rowcounts: 219 existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug) 220 remote_rowcount = self.get_rowcount( 221 begin=chunk_begin, 222 end=chunk_end, 223 remote=True, 224 debug=debug, 225 ) 226 checked_rows_str = ( 227 f"checked {existing_rowcount:,} row" 228 + ("s" if existing_rowcount != 1 else '') 229 + f" vs {remote_rowcount:,} remote" 230 ) 231 if ( 232 existing_rowcount is not None 233 and remote_rowcount is not None 234 and existing_rowcount >= remote_rowcount 235 ): 236 do_sync = False 237 chunk_success, chunk_msg = True, ( 238 "Row-count is up-to-date " 239 f"({checked_rows_str})." 240 ) 241 elif check_rowcounts_only: 242 do_sync = False 243 chunk_success, chunk_msg = True, ( 244 f"Row-counts are out-of-sync ({checked_rows_str})." 245 ) 246 247 num_syncs = 0 248 while num_syncs < max_chunks_syncs: 249 chunk_success, chunk_msg = self.sync( 250 begin=chunk_begin, 251 end=chunk_end, 252 params=params, 253 workers=_workers, 254 debug=debug, 255 **kwargs 256 ) if do_sync else (chunk_success, chunk_msg) 257 if chunk_success: 258 break 259 num_syncs += 1 260 time.sleep(num_syncs**2) 261 chunk_msg = chunk_msg.strip() 262 if ' - ' not in chunk_msg: 263 chunk_label = f"{chunk_begin} - {chunk_end}" 264 chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}' 265 mrsm.pprint((chunk_success, chunk_msg)) 266 267 return chunk_begin_and_end, (chunk_success, chunk_msg) 268 269 ### If we have more than one chunk, attempt to sync the first one and return if its fails. 270 if len(chunk_bounds) > 1: 271 first_chunk_bounds = chunk_bounds[0] 272 first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}" 273 info(f"Verifying first chunk for {self}:\n {first_label}") 274 ( 275 (first_begin, first_end), 276 (first_success, first_msg) 277 ) = process_chunk_bounds(first_chunk_bounds, _workers=workers) 278 if not first_success: 279 return ( 280 first_success, 281 f"\n{first_label}\n" 282 + f"Failed to sync first chunk:\n{first_msg}" 283 ) 284 bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg) 285 info(f"Completed first chunk for {self}:\n {first_label}\n") 286 chunk_bounds = chunk_bounds[1:] 287 288 pool = get_pool(workers=workers) 289 batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers) 290 291 def process_batch( 292 batch_chunk_bounds: Tuple[ 293 Tuple[Union[datetime, int, None], Union[datetime, int, None]], 294 ... 295 ] 296 ): 297 _batch_begin = batch_chunk_bounds[0][0] 298 _batch_end = batch_chunk_bounds[-1][-1] 299 batch_message_header = f"{_batch_begin} - {_batch_end}" 300 301 if check_rowcounts_only: 302 info(f"Checking row-counts for batch bounds:\n {batch_message_header}") 303 _, (batch_init_success, batch_init_msg) = process_chunk_bounds( 304 (_batch_begin, _batch_end) 305 ) 306 mrsm.pprint((batch_init_success, batch_init_msg)) 307 if batch_init_success and 'up-to-date' in batch_init_msg: 308 info("Entire batch is up-to-date.") 309 return batch_init_success, batch_init_msg 310 311 batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds)) 312 bounds_success_tuples.update(batch_bounds_success_tuples) 313 batch_bounds_success_bools = { 314 bounds: tup[0] 315 for bounds, tup in batch_bounds_success_tuples.items() 316 } 317 318 if all(batch_bounds_success_bools.values()): 319 msg = get_chunks_success_message( 320 batch_bounds_success_tuples, 321 header=batch_message_header, 322 check_rowcounts_only=check_rowcounts_only, 323 ) 324 if deduplicate: 325 deduplicate_success, deduplicate_msg = self.deduplicate( 326 begin=_batch_begin, 327 end=_batch_end, 328 params=params, 329 workers=workers, 330 debug=debug, 331 **kwargs 332 ) 333 return deduplicate_success, msg + '\n\n' + deduplicate_msg 334 return True, msg 335 336 batch_chunk_bounds_to_resync = [ 337 bounds 338 for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools) 339 if not success 340 ] 341 batch_bounds_to_print = [ 342 f"{bounds[0]} - {bounds[1]}" 343 for bounds in batch_chunk_bounds_to_resync 344 ] 345 if batch_bounds_to_print: 346 warn( 347 "Will resync the following failed chunks:\n " 348 + '\n '.join(batch_bounds_to_print), 349 stack=False, 350 ) 351 352 retry_bounds_success_tuples = dict(pool.map( 353 process_chunk_bounds, 354 batch_chunk_bounds_to_resync 355 )) 356 batch_bounds_success_tuples.update(retry_bounds_success_tuples) 357 bounds_success_tuples.update(retry_bounds_success_tuples) 358 retry_bounds_success_bools = { 359 bounds: tup[0] 360 for bounds, tup in retry_bounds_success_tuples.items() 361 } 362 363 if all(retry_bounds_success_bools.values()): 364 chunks_message = ( 365 get_chunks_success_message( 366 batch_bounds_success_tuples, 367 header=batch_message_header, 368 check_rowcounts_only=check_rowcounts_only, 369 ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + ( 370 's' 371 if len(batch_chunk_bounds_to_resync) != 1 372 else '' 373 ) + "." 374 ) 375 if deduplicate: 376 deduplicate_success, deduplicate_msg = self.deduplicate( 377 begin=_batch_begin, 378 end=_batch_end, 379 params=params, 380 workers=workers, 381 debug=debug, 382 **kwargs 383 ) 384 return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg 385 return True, chunks_message 386 387 batch_chunks_message = get_chunks_success_message( 388 batch_bounds_success_tuples, 389 header=batch_message_header, 390 check_rowcounts_only=check_rowcounts_only, 391 ) 392 if deduplicate: 393 deduplicate_success, deduplicate_msg = self.deduplicate( 394 begin=begin, 395 end=end, 396 params=params, 397 workers=workers, 398 debug=debug, 399 **kwargs 400 ) 401 return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg 402 return False, batch_chunks_message 403 404 num_batches = len(batches) 405 for batch_i, batch in enumerate(batches): 406 batch_begin = batch[0][0] 407 batch_end = batch[-1][-1] 408 batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})" 409 batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}" 410 retry_failed_batch = True 411 try: 412 for_self = 'for ' + str(self) 413 batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n ') 414 info(f"Verifying {batch_label_str}\n") 415 batch_success, batch_msg = process_batch(batch) 416 except (KeyboardInterrupt, Exception) as e: 417 batch_success = False 418 batch_msg = str(e) 419 retry_failed_batch = False 420 421 batch_msg_to_print = ( 422 f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}" 423 ) 424 mrsm.pprint((batch_success, batch_msg_to_print)) 425 426 if not batch_success and retry_failed_batch: 427 info(f"Retrying batch {batch_counter_str}...") 428 retry_batch_success, retry_batch_msg = process_batch(batch) 429 retry_batch_msg_to_print = ( 430 f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}" 431 ) 432 mrsm.pprint((retry_batch_success, retry_batch_msg_to_print)) 433 434 batch_success = retry_batch_success 435 batch_msg = retry_batch_msg 436 437 if not batch_success: 438 return False, f"Failed to verify {batch_label}:\n\n{batch_msg}" 439 440 chunks_message = get_chunks_success_message( 441 bounds_success_tuples, 442 header=message_header, 443 check_rowcounts_only=check_rowcounts_only, 444 ) 445 return True, chunks_message
Verify the contents of the pipe by resyncing its interval.
Parameters
- begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
- end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
- chunk_interval (Union[timedelta, int, None], default None):
If provided, use this as the size of the chunk boundaries.
Default to the value set in
pipe.parameters['chunk_minutes']
(1440). - bounded (Optional[bool], default None):
If
True
, do not verify older than the oldest sync time or newer than the newest. IfFalse
, verify unbounded syncs outside of the new and old sync times. The default behavior (None
) is to bound only if a bound interval is set (e.g.pipe.parameters['verify']['bound_days']
). - deduplicate (bool, default False):
If
True
, deduplicate the pipe's table after the verification syncs. - workers (Optional[int], default None):
If provided, limit the verification to this many threads.
Use a value of
1
to sync chunks in series. - batchsize (Optional[int], default None):
If provided, sync this many chunks in parallel.
Defaults to
Pipe.get_num_workers()
. - skip_chunks_with_greater_rowcounts (bool, default False):
If
True
, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount. - check_rowcounts_only (bool, default False):
If
True
, only compare rowcounts and print chunks which are out-of-sync. - debug (bool, default False): Verbosity toggle.
- kwargs (Any):
All keyword arguments are passed to
pipe.sync()
.
Returns
- A SuccessTuple indicating whether the pipe was successfully resynced.
546def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]: 547 """ 548 Return the interval used to determine the bound time (limit for verification syncs). 549 If the datetime axis is an integer, just return its value. 550 551 Below are the supported keys for the bound interval: 552 553 - `pipe.parameters['verify']['bound_minutes']` 554 - `pipe.parameters['verify']['bound_hours']` 555 - `pipe.parameters['verify']['bound_days']` 556 - `pipe.parameters['verify']['bound_weeks']` 557 - `pipe.parameters['verify']['bound_years']` 558 - `pipe.parameters['verify']['bound_seconds']` 559 560 If multiple keys are present, the first on this priority list will be used. 561 562 Returns 563 ------- 564 A `timedelta` or `int` value to be used to determine the bound time. 565 """ 566 verify_params = self.parameters.get('verify', {}) 567 prefix = 'bound_' 568 suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds') 569 keys_to_search = { 570 key: val 571 for key, val in verify_params.items() 572 if key.startswith(prefix) 573 } 574 bound_time_key, bound_time_value = None, None 575 for key, value in keys_to_search.items(): 576 for suffix in suffixes_to_check: 577 if key == prefix + suffix: 578 bound_time_key = key 579 bound_time_value = value 580 break 581 if bound_time_key is not None: 582 break 583 584 if bound_time_value is None: 585 return bound_time_value 586 587 dt_col = self.columns.get('datetime', None) 588 if not dt_col: 589 return bound_time_value 590 591 dt_typ = self.dtypes.get(dt_col, 'datetime') 592 if 'int' in dt_typ.lower(): 593 return int(bound_time_value) 594 595 interval_type = bound_time_key.replace(prefix, '') 596 return timedelta(**{interval_type: bound_time_value})
Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.
Below are the supported keys for the bound interval:
- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`
If multiple keys are present, the first on this priority list will be used.
Returns
- A
timedelta
orint
value to be used to determine the bound time.
599def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]: 600 """ 601 The bound time is the limit at which long-running verification syncs should stop. 602 A value of `None` means verification syncs should be unbounded. 603 604 Like deriving a backtrack time from `pipe.get_sync_time()`, 605 the bound time is the sync time minus a large window (e.g. 366 days). 606 607 Unbound verification syncs (i.e. `bound_time is None`) 608 if the oldest sync time is less than the bound interval. 609 610 Returns 611 ------- 612 A `datetime` or `int` corresponding to the 613 `begin` bound for verification and deduplication syncs. 614 """ 615 bound_interval = self.get_bound_interval(debug=debug) 616 if bound_interval is None: 617 return None 618 619 sync_time = self.get_sync_time(debug=debug) 620 if sync_time is None: 621 return None 622 623 bound_time = sync_time - bound_interval 624 oldest_sync_time = self.get_sync_time(newest=False, debug=debug) 625 max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days'] 626 627 extreme_sync_times_delta = ( 628 hasattr(oldest_sync_time, 'tzinfo') 629 and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days) 630 ) 631 632 return ( 633 bound_time 634 if bound_time > oldest_sync_time or extreme_sync_times_delta 635 else None 636 )
The bound time is the limit at which long-running verification syncs should stop.
A value of None
means verification syncs should be unbounded.
Like deriving a backtrack time from pipe.get_sync_time()
,
the bound time is the sync time minus a large window (e.g. 366 days).
Unbound verification syncs (i.e. bound_time is None
)
if the oldest sync time is less than the bound interval.
Returns
- A
datetime
orint
corresponding to the begin
bound for verification and deduplication syncs.
12def delete( 13 self, 14 drop: bool = True, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Call the Pipe's instance connector's `delete_pipe()` method. 20 21 Parameters 22 ---------- 23 drop: bool, default True 24 If `True`, drop the pipes' target table. 25 26 debug : bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success (`bool`), message (`str`). 32 33 """ 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.venv import Venv 36 from meerschaum.connectors import get_connector_plugin 37 38 if self.temporary: 39 if self.cache: 40 invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug) 41 if not invalidate_success: 42 return invalidate_success, invalidate_msg 43 44 return ( 45 False, 46 "Cannot delete pipes created with `temporary=True` (read-only). " 47 + "You may want to call `pipe.drop()` instead." 48 ) 49 50 if drop: 51 drop_success, drop_msg = self.drop(debug=debug) 52 if not drop_success: 53 warn(f"Failed to drop {self}:\n{drop_msg}") 54 55 with Venv(get_connector_plugin(self.instance_connector)): 56 result = self.instance_connector.delete_pipe(self, debug=debug, **kw) 57 58 if not isinstance(result, tuple): 59 return False, f"Received an unexpected result from '{self.instance_connector}': {result}" 60 61 if result[0]: 62 self._invalidate_cache(hard=True, debug=debug) 63 64 return result
Call the Pipe's instance connector's delete_pipe()
method.
Parameters
- drop (bool, default True):
If
True
, drop the pipes' target table. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTuple
of success (bool
), message (str
).
14def drop( 15 self, 16 debug: bool = False, 17 **kw: Any 18) -> SuccessTuple: 19 """ 20 Call the Pipe's instance connector's `drop_pipe()` method. 21 22 Parameters 23 ---------- 24 debug: bool, default False: 25 Verbosity toggle. 26 27 Returns 28 ------- 29 A `SuccessTuple` of success, message. 30 31 """ 32 from meerschaum.utils.venv import Venv 33 from meerschaum.connectors import get_connector_plugin 34 35 self._clear_cache_key('_exists', debug=debug) 36 37 with Venv(get_connector_plugin(self.instance_connector)): 38 if hasattr(self.instance_connector, 'drop_pipe'): 39 result = self.instance_connector.drop_pipe(self, debug=debug, **kw) 40 else: 41 result = ( 42 False, 43 ( 44 "Cannot drop pipes for instance connectors of type " 45 f"'{self.instance_connector.type}'." 46 ) 47 ) 48 49 self._clear_cache_key('_exists', debug=debug) 50 self._clear_cache_key('_exists_timestamp', debug=debug) 51 52 return result
Call the Pipe's instance connector's drop_pipe()
method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTuple
of success, message.
55def drop_indices( 56 self, 57 columns: Optional[List[str]] = None, 58 debug: bool = False, 59 **kw: Any 60) -> SuccessTuple: 61 """ 62 Call the Pipe's instance connector's `drop_indices()` method. 63 64 Parameters 65 ---------- 66 columns: Optional[List[str]] = None 67 If provided, only drop indices in the given list. 68 69 debug: bool, default False: 70 Verbosity toggle. 71 72 Returns 73 ------- 74 A `SuccessTuple` of success, message. 75 76 """ 77 from meerschaum.utils.venv import Venv 78 from meerschaum.connectors import get_connector_plugin 79 80 self._clear_cache_key('_columns_indices', debug=debug) 81 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 82 self._clear_cache_key('_columns_types', debug=debug) 83 self._clear_cache_key('_columns_types_timestamp', debug=debug) 84 85 with Venv(get_connector_plugin(self.instance_connector)): 86 if hasattr(self.instance_connector, 'drop_pipe_indices'): 87 result = self.instance_connector.drop_pipe_indices( 88 self, 89 columns=columns, 90 debug=debug, 91 **kw 92 ) 93 else: 94 result = ( 95 False, 96 ( 97 "Cannot drop indices for instance connectors of type " 98 f"'{self.instance_connector.type}'." 99 ) 100 ) 101 102 self._clear_cache_key('_columns_indices', debug=debug) 103 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 104 self._clear_cache_key('_columns_types', debug=debug) 105 self._clear_cache_key('_columns_types_timestamp', debug=debug) 106 107 return result
Call the Pipe's instance connector's drop_indices()
method.
Parameters
- columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTuple
of success, message.
14def create_indices( 15 self, 16 columns: Optional[List[str]] = None, 17 debug: bool = False, 18 **kw: Any 19) -> SuccessTuple: 20 """ 21 Call the Pipe's instance connector's `create_pipe_indices()` method. 22 23 Parameters 24 ---------- 25 debug: bool, default False: 26 Verbosity toggle. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 32 """ 33 from meerschaum.utils.venv import Venv 34 from meerschaum.connectors import get_connector_plugin 35 36 self._clear_cache_key('_columns_indices', debug=debug) 37 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 38 self._clear_cache_key('_columns_types', debug=debug) 39 self._clear_cache_key('_columns_types_timestamp', debug=debug) 40 41 with Venv(get_connector_plugin(self.instance_connector)): 42 if hasattr(self.instance_connector, 'create_pipe_indices'): 43 result = self.instance_connector.create_pipe_indices( 44 self, 45 columns=columns, 46 debug=debug, 47 **kw 48 ) 49 else: 50 result = ( 51 False, 52 ( 53 "Cannot create indices for instance connectors of type " 54 f"'{self.instance_connector.type}'." 55 ) 56 ) 57 58 self._clear_cache_key('_columns_indices', debug=debug) 59 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 60 self._clear_cache_key('_columns_types', debug=debug) 61 self._clear_cache_key('_columns_types_timestamp', debug=debug) 62 63 return result
Call the Pipe's instance connector's create_pipe_indices()
method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTuple
of success, message.
16def clear( 17 self, 18 begin: Optional[datetime] = None, 19 end: Optional[datetime] = None, 20 params: Optional[Dict[str, Any]] = None, 21 debug: bool = False, 22 **kwargs: Any 23) -> SuccessTuple: 24 """ 25 Call the Pipe's instance connector's `clear_pipe` method. 26 27 Parameters 28 ---------- 29 begin: Optional[datetime], default None: 30 If provided, only remove rows newer than this datetime value. 31 32 end: Optional[datetime], default None: 33 If provided, only remove rows older than this datetime column (not including end). 34 35 params: Optional[Dict[str, Any]], default None 36 See `meerschaum.utils.sql.build_where`. 37 38 debug: bool, default False: 39 Verbositity toggle. 40 41 Returns 42 ------- 43 A `SuccessTuple` corresponding to whether this procedure completed successfully. 44 45 Examples 46 -------- 47 >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local') 48 >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]}) 49 >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]}) 50 >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]}) 51 >>> 52 >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0)) 53 >>> pipe.get_data() 54 dt 55 0 2020-01-01 56 57 """ 58 from meerschaum.utils.warnings import warn 59 from meerschaum.utils.venv import Venv 60 from meerschaum.connectors import get_connector_plugin 61 62 begin, end = self.parse_date_bounds(begin, end) 63 64 with Venv(get_connector_plugin(self.instance_connector)): 65 return self.instance_connector.clear_pipe( 66 self, 67 begin=begin, 68 end=end, 69 params=params, 70 debug=debug, 71 **kwargs 72 )
Call the Pipe's instance connector's clear_pipe
method.
Parameters
- begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
- end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
See
meerschaum.utils.sql.build_where
. - debug (bool, default False:): Verbositity toggle.
Returns
- A
SuccessTuple
corresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>>
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
dt
0 2020-01-01
15def deduplicate( 16 self, 17 begin: Union[datetime, int, None] = None, 18 end: Union[datetime, int, None] = None, 19 params: Optional[Dict[str, Any]] = None, 20 chunk_interval: Union[datetime, int, None] = None, 21 bounded: Optional[bool] = None, 22 workers: Optional[int] = None, 23 debug: bool = False, 24 _use_instance_method: bool = True, 25 **kwargs: Any 26) -> SuccessTuple: 27 """ 28 Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows. 29 30 Parameters 31 ---------- 32 begin: Union[datetime, int, None], default None: 33 If provided, only deduplicate rows newer than this datetime value. 34 35 end: Union[datetime, int, None], default None: 36 If provided, only deduplicate rows older than this datetime column (not including end). 37 38 params: Optional[Dict[str, Any]], default None 39 Restrict deduplication to this filter (for multiplexed data streams). 40 See `meerschaum.utils.sql.build_where`. 41 42 chunk_interval: Union[timedelta, int, None], default None 43 If provided, use this for the chunk bounds. 44 Defaults to the value set in `pipe.parameters['chunk_minutes']` (1440). 45 46 bounded: Optional[bool], default None 47 Only check outside the oldest and newest sync times if bounded is explicitly `False`. 48 49 workers: Optional[int], default None 50 If the instance connector is thread-safe, limit concurrenct syncs to this many threads. 51 52 debug: bool, default False: 53 Verbositity toggle. 54 55 kwargs: Any 56 All other keyword arguments are passed to 57 `pipe.sync()`, `pipe.clear()`, and `pipe.get_data(). 58 59 Returns 60 ------- 61 A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated. 62 """ 63 from meerschaum.utils.warnings import warn, info 64 from meerschaum.utils.misc import interval_str, items_str 65 from meerschaum.utils.venv import Venv 66 from meerschaum.connectors import get_connector_plugin 67 from meerschaum.utils.pool import get_pool 68 69 begin, end = self.parse_date_bounds(begin, end) 70 71 workers = self.get_num_workers(workers=workers) 72 pool = get_pool(workers=workers) 73 74 if _use_instance_method: 75 with Venv(get_connector_plugin(self.instance_connector)): 76 if hasattr(self.instance_connector, 'deduplicate_pipe'): 77 return self.instance_connector.deduplicate_pipe( 78 self, 79 begin=begin, 80 end=end, 81 params=params, 82 bounded=bounded, 83 debug=debug, 84 **kwargs 85 ) 86 87 ### Only unbound if explicitly False. 88 if bounded is None: 89 bounded = True 90 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 91 92 bound_time = self.get_bound_time(debug=debug) 93 if bounded and begin is None: 94 begin = ( 95 bound_time 96 if bound_time is not None 97 else self.get_sync_time(newest=False, debug=debug) 98 ) 99 if bounded and end is None: 100 end = self.get_sync_time(newest=True, debug=debug) 101 if end is not None: 102 end += ( 103 timedelta(minutes=1) 104 if hasattr(end, 'tzinfo') 105 else 1 106 ) 107 108 chunk_bounds = self.get_chunk_bounds( 109 bounded=bounded, 110 begin=begin, 111 end=end, 112 chunk_interval=chunk_interval, 113 debug=debug, 114 ) 115 116 indices = [col for col in self.columns.values() if col] 117 if not indices: 118 return False, "Cannot deduplicate without index columns." 119 120 def process_chunk_bounds(bounds) -> Tuple[ 121 Tuple[ 122 Union[datetime, int, None], 123 Union[datetime, int, None] 124 ], 125 SuccessTuple 126 ]: 127 ### Only selecting the index values here to keep bandwidth down. 128 chunk_begin, chunk_end = bounds 129 chunk_df = self.get_data( 130 select_columns=indices, 131 begin=chunk_begin, 132 end=chunk_end, 133 params=params, 134 debug=debug, 135 ) 136 if chunk_df is None: 137 return bounds, (True, "") 138 existing_chunk_len = len(chunk_df) 139 deduped_chunk_df = chunk_df.drop_duplicates(keep='last') 140 deduped_chunk_len = len(deduped_chunk_df) 141 142 if existing_chunk_len == deduped_chunk_len: 143 return bounds, (True, "") 144 145 chunk_msg_header = f"\n{chunk_begin} - {chunk_end}" 146 chunk_msg_body = "" 147 148 full_chunk = self.get_data( 149 begin=chunk_begin, 150 end=chunk_end, 151 params=params, 152 debug=debug, 153 ) 154 if full_chunk is None or len(full_chunk) == 0: 155 return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...") 156 157 chunk_indices = [ix for ix in indices if ix in full_chunk.columns] 158 if not chunk_indices: 159 return bounds, (False, f"None of {items_str(indices)} were present in chunk.") 160 try: 161 full_chunk = full_chunk.drop_duplicates( 162 subset=chunk_indices, 163 keep='last' 164 ).reset_index( 165 drop=True, 166 ) 167 except Exception as e: 168 return ( 169 bounds, 170 (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})") 171 ) 172 173 clear_success, clear_msg = self.clear( 174 begin=chunk_begin, 175 end=chunk_end, 176 params=params, 177 debug=debug, 178 ) 179 if not clear_success: 180 chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n" 181 warn(chunk_msg_body) 182 183 sync_success, sync_msg = self.sync(full_chunk, debug=debug) 184 if not sync_success: 185 chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n" 186 187 ### Finally check if the deduplication worked. 188 chunk_rowcount = self.get_rowcount( 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 debug=debug, 193 ) 194 if chunk_rowcount != deduped_chunk_len: 195 return bounds, ( 196 False, ( 197 chunk_msg_header + "\n" 198 + chunk_msg_body + ("\n" if chunk_msg_body else '') 199 + "Chunk rowcounts still differ (" 200 + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)." 201 ) 202 ) 203 204 return bounds, ( 205 True, ( 206 chunk_msg_header + "\n" 207 + chunk_msg_body + ("\n" if chunk_msg_body else '') 208 + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows." 209 ) 210 ) 211 212 info( 213 f"Deduplicating {len(chunk_bounds)} chunk" 214 + ('s' if len(chunk_bounds) != 1 else '') 215 + f" ({'un' if not bounded else ''}bounded)" 216 + f" of size '{interval_str(chunk_interval)}'" 217 + f" on {self}." 218 ) 219 bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds)) 220 bounds_successes = { 221 bounds: success_tuple 222 for bounds, success_tuple in bounds_success_tuples.items() 223 if success_tuple[0] 224 } 225 bounds_failures = { 226 bounds: success_tuple 227 for bounds, success_tuple in bounds_success_tuples.items() 228 if not success_tuple[0] 229 } 230 231 ### No need to retry if everything failed. 232 if len(bounds_failures) > 0 and len(bounds_successes) == 0: 233 return ( 234 False, 235 ( 236 f"Failed to deduplicate {len(bounds_failures)} chunk" 237 + ('s' if len(bounds_failures) != 1 else '') 238 + ".\n" 239 + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg]) 240 ) 241 ) 242 243 retry_bounds = [bounds for bounds in bounds_failures] 244 if not retry_bounds: 245 return ( 246 True, 247 ( 248 f"Successfully deduplicated {len(bounds_successes)} chunk" 249 + ('s' if len(bounds_successes) != 1 else '') 250 + ".\n" 251 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 252 ).rstrip('\n') 253 ) 254 255 info(f"Retrying {len(retry_bounds)} chunks for {self}...") 256 retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds)) 257 retry_bounds_successes = { 258 bounds: success_tuple 259 for bounds, success_tuple in bounds_success_tuples.items() 260 if success_tuple[0] 261 } 262 retry_bounds_failures = { 263 bounds: success_tuple 264 for bounds, success_tuple in bounds_success_tuples.items() 265 if not success_tuple[0] 266 } 267 268 bounds_successes.update(retry_bounds_successes) 269 if not retry_bounds_failures: 270 return ( 271 True, 272 ( 273 f"Successfully deduplicated {len(bounds_successes)} chunk" 274 + ('s' if len(bounds_successes) != 1 else '') 275 + f"({len(retry_bounds_successes)} retried):\n" 276 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 277 ).rstrip('\n') 278 ) 279 280 return ( 281 False, 282 ( 283 f"Failed to deduplicate {len(bounds_failures)} chunk" 284 + ('s' if len(retry_bounds_failures) != 1 else '') 285 + ".\n" 286 + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg]) 287 ).rstrip('\n') 288 )
Call the Pipe's instance connector's delete_duplicates
method to delete duplicate rows.
Parameters
- begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
- end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
Restrict deduplication to this filter (for multiplexed data streams).
See
meerschaum.utils.sql.build_where
. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this for the chunk bounds.
Defaults to the value set in
pipe.parameters['chunk_minutes']
(1440). - bounded (Optional[bool], default None):
Only check outside the oldest and newest sync times if bounded is explicitly
False
. - workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
- debug (bool, default False:): Verbositity toggle.
- kwargs (Any):
All other keyword arguments are passed to
pipe.sync()
,pipe.clear()
, and `pipe.get_data().
Returns
- A
SuccessTuple
corresponding to whether all of the chunks were successfully deduplicated.
16def bootstrap( 17 self, 18 debug: bool = False, 19 yes: bool = False, 20 force: bool = False, 21 noask: bool = False, 22 shell: bool = False, 23 **kw 24) -> SuccessTuple: 25 """ 26 Prompt the user to create a pipe's requirements all from one method. 27 This method shouldn't be used in any automated scripts because it interactively 28 prompts the user and therefore may hang. 29 30 Parameters 31 ---------- 32 debug: bool, default False: 33 Verbosity toggle. 34 35 yes: bool, default False: 36 Print the questions and automatically agree. 37 38 force: bool, default False: 39 Skip the questions and agree anyway. 40 41 noask: bool, default False: 42 Print the questions but go with the default answer. 43 44 shell: bool, default False: 45 Used to determine if we are in the interactive shell. 46 47 Returns 48 ------- 49 A `SuccessTuple` corresponding to the success of this procedure. 50 51 """ 52 53 from meerschaum.utils.warnings import info 54 from meerschaum.utils.prompt import prompt, yes_no 55 from meerschaum.utils.formatting import pprint 56 from meerschaum.config import get_config 57 from meerschaum.utils.formatting._shell import clear_screen 58 from meerschaum.utils.formatting import print_tuple 59 from meerschaum.actions import actions 60 from meerschaum.utils.venv import Venv 61 from meerschaum.connectors import get_connector_plugin 62 63 _clear = get_config('shell', 'clear_screen', patch=True) 64 65 if self.get_id(debug=debug) is not None: 66 delete_tuple = self.delete(debug=debug) 67 if not delete_tuple[0]: 68 return delete_tuple 69 70 if _clear: 71 clear_screen(debug=debug) 72 73 _parameters = _get_parameters(self, debug=debug) 74 self.parameters = _parameters 75 pprint(self.parameters) 76 try: 77 prompt( 78 f"\n Press [Enter] to register {self} with the above configuration:", 79 icon = False 80 ) 81 except KeyboardInterrupt: 82 return False, f"Aborted bootstrapping {self}." 83 84 with Venv(get_connector_plugin(self.instance_connector)): 85 register_tuple = self.instance_connector.register_pipe(self, debug=debug) 86 87 if not register_tuple[0]: 88 return register_tuple 89 90 if _clear: 91 clear_screen(debug=debug) 92 93 try: 94 if yes_no( 95 f"Would you like to edit the definition for {self}?", 96 yes=yes, 97 noask=noask, 98 default='n', 99 ): 100 edit_tuple = self.edit_definition(debug=debug) 101 if not edit_tuple[0]: 102 return edit_tuple 103 104 if yes_no( 105 f"Would you like to try syncing {self} now?", 106 yes=yes, 107 noask=noask, 108 default='n', 109 ): 110 sync_tuple = actions['sync']( 111 ['pipes'], 112 connector_keys=[self.connector_keys], 113 metric_keys=[self.metric_key], 114 location_keys=[self.location_key], 115 mrsm_instance=str(self.instance_connector), 116 debug=debug, 117 shell=shell, 118 ) 119 if not sync_tuple[0]: 120 return sync_tuple 121 except Exception as e: 122 return False, f"Failed to bootstrap {self}:\n" + str(e) 123 124 print_tuple((True, f"Finished bootstrapping {self}!")) 125 info( 126 "You can edit this pipe later with `edit pipes` " 127 + "or set the definition with `edit pipes definition`.\n" 128 + " To sync data into your pipe, run `sync pipes`." 129 ) 130 131 return True, "Success"
Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.
Parameters
- debug (bool, default False:): Verbosity toggle.
- yes (bool, default False:): Print the questions and automatically agree.
- force (bool, default False:): Skip the questions and agree anyway.
- noask (bool, default False:): Print the questions but go with the default answer.
- shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
- A
SuccessTuple
corresponding to the success of this procedure.
20def enforce_dtypes( 21 self, 22 df: 'pd.DataFrame', 23 chunksize: Optional[int] = -1, 24 enforce: bool = True, 25 safe_copy: bool = True, 26 dtypes: Optional[Dict[str, str]] = None, 27 debug: bool = False, 28) -> 'pd.DataFrame': 29 """ 30 Cast the input dataframe to the pipe's registered data types. 31 If the pipe does not exist and dtypes are not set, return the dataframe. 32 """ 33 import traceback 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.debug import dprint 36 from meerschaum.utils.dataframe import ( 37 parse_df_datetimes, 38 enforce_dtypes as _enforce_dtypes, 39 parse_simple_lines, 40 ) 41 from meerschaum.utils.dtypes import are_dtypes_equal 42 from meerschaum.utils.packages import import_pandas 43 pd = import_pandas(debug=debug) 44 if df is None: 45 if debug: 46 dprint( 47 "Received None instead of a DataFrame.\n" 48 + " Skipping dtype enforcement..." 49 ) 50 return df 51 52 if not self.enforce: 53 enforce = False 54 55 explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {} 56 pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes 57 58 try: 59 if isinstance(df, str): 60 if df.strip() and df.strip()[0] not in ('{', '['): 61 df = parse_df_datetimes( 62 parse_simple_lines(df), 63 ignore_cols=[ 64 col 65 for col, dtype in pipe_dtypes.items() 66 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 67 ], 68 ) 69 else: 70 df = parse_df_datetimes( 71 pd.read_json(StringIO(df)), 72 ignore_cols=[ 73 col 74 for col, dtype in pipe_dtypes.items() 75 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 76 ], 77 ignore_all=(not enforce), 78 strip_timezone=(self.tzinfo is None), 79 chunksize=chunksize, 80 debug=debug, 81 ) 82 elif isinstance(df, (dict, list, tuple)): 83 df = parse_df_datetimes( 84 df, 85 ignore_cols=[ 86 col 87 for col, dtype in pipe_dtypes.items() 88 if (not enforce or not are_dtypes_equal(str(dtype), 'datetime')) 89 ], 90 strip_timezone=(self.tzinfo is None), 91 chunksize=chunksize, 92 debug=debug, 93 ) 94 except Exception as e: 95 warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}") 96 return None 97 98 if not pipe_dtypes: 99 if debug: 100 dprint( 101 f"Could not find dtypes for {self}.\n" 102 + "Skipping dtype enforcement..." 103 ) 104 return df 105 106 return _enforce_dtypes( 107 df, 108 pipe_dtypes, 109 explicit_dtypes=explicit_dtypes, 110 safe_copy=safe_copy, 111 strip_timezone=(self.tzinfo is None), 112 coerce_numeric=self.mixed_numerics, 113 coerce_timezone=enforce, 114 debug=debug, 115 )
Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.
118def infer_dtypes( 119 self, 120 persist: bool = False, 121 refresh: bool = False, 122 debug: bool = False, 123) -> Dict[str, Any]: 124 """ 125 If `dtypes` is not set in `meerschaum.Pipe.parameters`, 126 infer the data types from the underlying table if it exists. 127 128 Parameters 129 ---------- 130 persist: bool, default False 131 If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`. 132 NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only. 133 134 refresh: bool, default False 135 If `True`, retrieve the latest columns-types for the pipe. 136 See `Pipe.get_columns.types()`. 137 138 Returns 139 ------- 140 A dictionary of strings containing the pandas data types for this Pipe. 141 """ 142 if not self.exists(debug=debug): 143 return {} 144 145 from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type 146 from meerschaum.utils.dtypes import to_pandas_dtype 147 148 ### NOTE: get_columns_types() may return either the types as 149 ### PostgreSQL- or Pandas-style. 150 columns_types = self.get_columns_types(refresh=refresh, debug=debug) 151 152 remote_pd_dtypes = { 153 c: ( 154 get_pd_type_from_db_type(t, allow_custom_dtypes=True) 155 if str(t).isupper() 156 else to_pandas_dtype(t) 157 ) 158 for c, t in columns_types.items() 159 } if columns_types else {} 160 if not persist: 161 return remote_pd_dtypes 162 163 parameters = self.get_parameters(refresh=refresh, debug=debug) 164 dtypes = parameters.get('dtypes', {}) 165 dtypes.update({ 166 col: typ 167 for col, typ in remote_pd_dtypes.items() 168 if col not in dtypes 169 }) 170 self.dtypes = dtypes 171 self.edit(interactive=False, debug=debug) 172 return remote_pd_dtypes
If dtypes
is not set in meerschaum.Pipe.parameters
,
infer the data types from the underlying table if it exists.
Parameters
- persist (bool, default False):
If
True
, persist the inferred data types tomeerschaum.Pipe.parameters
. NOTE: Use with caution! Generallydtypes
is meant to be user-configurable only. - refresh (bool, default False):
If
True
, retrieve the latest columns-types for the pipe. SeePipe.get_columns.types()
.
Returns
- A dictionary of strings containing the pandas data types for this Pipe.
15def copy_to( 16 self, 17 instance_keys: str, 18 sync: bool = True, 19 begin: Union[datetime, int, None] = None, 20 end: Union[datetime, int, None] = None, 21 params: Optional[Dict[str, Any]] = None, 22 chunk_interval: Union[timedelta, int, None] = None, 23 debug: bool = False, 24 **kwargs: Any 25) -> SuccessTuple: 26 """ 27 Copy a pipe to another instance. 28 29 Parameters 30 ---------- 31 instance_keys: str 32 The instance to which to copy this pipe. 33 34 sync: bool, default True 35 If `True`, sync the source pipe's documents 36 37 begin: Union[datetime, int, None], default None 38 Beginning datetime value to pass to `Pipe.get_data()`. 39 40 end: Union[datetime, int, None], default None 41 End datetime value to pass to `Pipe.get_data()`. 42 43 params: Optional[Dict[str, Any]], default None 44 Parameters filter to pass to `Pipe.get_data()`. 45 46 chunk_interval: Union[timedelta, int, None], default None 47 The size of chunks to retrieve from `Pipe.get_data()` for syncing. 48 49 kwargs: Any 50 Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`. 51 52 Returns 53 ------- 54 A SuccessTuple indicating success. 55 """ 56 if str(instance_keys) == self.instance_keys: 57 return False, f"Cannot copy {self} to instance '{instance_keys}'." 58 59 begin, end = self.parse_date_bounds(begin, end) 60 61 new_pipe = mrsm.Pipe( 62 self.connector_keys, 63 self.metric_key, 64 self.location_key, 65 parameters=self.parameters.copy(), 66 instance=instance_keys, 67 ) 68 69 new_pipe_is_registered = new_pipe.get_id() is not None 70 71 metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register 72 metadata_success, metadata_msg = metadata_method(debug=debug) 73 if not metadata_success: 74 return metadata_success, metadata_msg 75 76 if not self.exists(debug=debug): 77 return True, f"{self} does not exist; nothing to sync." 78 79 original_as_iterator = kwargs.get('as_iterator', None) 80 kwargs['as_iterator'] = True 81 82 chunk_generator = self.get_data( 83 begin=begin, 84 end=end, 85 params=params, 86 chunk_interval=chunk_interval, 87 debug=debug, 88 **kwargs 89 ) 90 91 if original_as_iterator is None: 92 _ = kwargs.pop('as_iterator', None) 93 else: 94 kwargs['as_iterator'] = original_as_iterator 95 96 sync_success, sync_msg = new_pipe.sync( 97 chunk_generator, 98 begin=begin, 99 end=end, 100 params=params, 101 debug=debug, 102 **kwargs 103 ) 104 msg = ( 105 f"Successfully synced {new_pipe}:\n{sync_msg}" 106 if sync_success 107 else f"Failed to sync {new_pipe}:\n{sync_msg}" 108 ) 109 return sync_success, msg
Copy a pipe to another instance.
Parameters
- instance_keys (str): The instance to which to copy this pipe.
- sync (bool, default True):
If
True
, sync the source pipe's documents - begin (Union[datetime, int, None], default None):
Beginning datetime value to pass to
Pipe.get_data()
. - end (Union[datetime, int, None], default None):
End datetime value to pass to
Pipe.get_data()
. - params (Optional[Dict[str, Any]], default None):
Parameters filter to pass to
Pipe.get_data()
. - chunk_interval (Union[timedelta, int, None], default None):
The size of chunks to retrieve from
Pipe.get_data()
for syncing. - kwargs (Any):
Additional flags to pass to
Pipe.get_data()
andPipe.sync()
, e.g.workers
.
Returns
- A SuccessTuple indicating success.
30class Plugin: 31 """Handle packaging of Meerschaum plugins.""" 32 33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 from meerschaum._internal.static import STATIC_CONFIG 46 from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo 74 75 76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector 93 94 95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version 106 107 108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module 121 122 123 @property 124 def __file__(self) -> Union[str, None]: 125 """ 126 Return the file path (str) of the plugin if it exists, otherwise `None`. 127 """ 128 if self.__dict__.get('_module', None) is not None: 129 return self.module.__file__ 130 131 from meerschaum.config.paths import PLUGINS_RESOURCES_PATH 132 133 potential_dir = PLUGINS_RESOURCES_PATH / self.name 134 if ( 135 potential_dir.exists() 136 and potential_dir.is_dir() 137 and (potential_dir / '__init__.py').exists() 138 ): 139 return str((potential_dir / '__init__.py').as_posix()) 140 141 potential_file = PLUGINS_RESOURCES_PATH / (self.name + '.py') 142 if potential_file.exists() and not potential_file.is_dir(): 143 return str(potential_file.as_posix()) 144 145 return None 146 147 148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path 159 160 161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None 170 171 172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path 255 256 257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 from meerschaum.utils.warnings import warn, error 289 if debug: 290 from meerschaum.utils.debug import dprint 291 import tarfile 292 import re 293 import ast 294 from meerschaum.plugins import sync_plugins_symlinks 295 from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum 296 from meerschaum.utils.venv import init_venv 297 from meerschaum.utils.misc import safely_extract_tar 298 from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS 299 old_cwd = os.getcwd() 300 old_version = '' 301 new_version = '' 302 temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name 303 temp_dir.mkdir(exist_ok=True) 304 305 if not self.archive_path.exists(): 306 return False, f"Missing archive file for plugin '{self}'." 307 if self.version is not None: 308 old_version = self.version 309 if debug: 310 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 311 312 if debug: 313 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 314 315 try: 316 with tarfile.open(self.archive_path, 'r:gz') as tarf: 317 safely_extract_tar(tarf, temp_dir) 318 except Exception as e: 319 warn(e) 320 return False, f"Failed to extract plugin '{self.name}'." 321 322 ### search for version information 323 files = os.listdir(temp_dir) 324 325 if str(files[0]) == self.name: 326 is_dir = True 327 elif str(files[0]) == self.name + '.py': 328 is_dir = False 329 else: 330 error(f"Unknown format encountered for plugin '{self}'.") 331 332 fpath = temp_dir / files[0] 333 if is_dir: 334 fpath = fpath / '__init__.py' 335 336 init_venv(self.name, debug=debug) 337 with open(fpath, 'r', encoding='utf-8') as f: 338 init_lines = f.readlines() 339 new_version = None 340 for line in init_lines: 341 if '__version__' not in line: 342 continue 343 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 344 if not version_match: 345 continue 346 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 347 break 348 if not new_version: 349 warn( 350 f"No `__version__` defined for plugin '{self}'. " 351 + "Assuming new version...", 352 stack = False, 353 ) 354 355 packaging_version = attempt_import('packaging.version') 356 try: 357 is_new_version = (not new_version and not old_version) or ( 358 packaging_version.parse(old_version) < packaging_version.parse(new_version) 359 ) 360 is_same_version = new_version and old_version and ( 361 packaging_version.parse(old_version) == packaging_version.parse(new_version) 362 ) 363 except Exception: 364 is_new_version, is_same_version = True, False 365 366 ### Determine where to permanently store the new plugin. 367 plugin_installation_dir_path = PLUGINS_DIR_PATHS[0] 368 for path in PLUGINS_DIR_PATHS: 369 if not path.exists(): 370 warn(f"Plugins path does not exist: {path}", stack=False) 371 continue 372 373 files_in_plugins_dir = os.listdir(path) 374 if ( 375 self.name in files_in_plugins_dir 376 or 377 (self.name + '.py') in files_in_plugins_dir 378 ): 379 plugin_installation_dir_path = path 380 break 381 382 success_msg = ( 383 f"Successfully installed plugin '{self}'" 384 + ("\n (skipped dependencies)" if skip_deps else "") 385 + "." 386 ) 387 success, abort = None, None 388 389 if is_same_version and not force: 390 success, msg = True, ( 391 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 392 " Install again with `-f` or `--force` to reinstall." 393 ) 394 abort = True 395 elif is_new_version or force: 396 for src_dir, dirs, files in os.walk(temp_dir): 397 if success is not None: 398 break 399 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 400 if not os.path.exists(dst_dir): 401 os.mkdir(dst_dir) 402 for f in files: 403 src_file = os.path.join(src_dir, f) 404 dst_file = os.path.join(dst_dir, f) 405 if os.path.exists(dst_file): 406 os.remove(dst_file) 407 408 if debug: 409 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 410 try: 411 shutil.move(src_file, dst_dir) 412 except Exception: 413 success, msg = False, ( 414 f"Failed to install plugin '{self}': " + 415 f"Could not move file '{src_file}' to '{dst_dir}'" 416 ) 417 print(msg) 418 break 419 if success is None: 420 success, msg = True, success_msg 421 else: 422 success, msg = False, ( 423 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 424 + f"attempted version {new_version}." 425 ) 426 427 shutil.rmtree(temp_dir) 428 os.chdir(old_cwd) 429 430 ### Reload the plugin's module. 431 sync_plugins_symlinks(debug=debug) 432 if '_module' in self.__dict__: 433 del self.__dict__['_module'] 434 init_venv(venv=self.name, force=True, debug=debug) 435 reload_meerschaum(debug=debug) 436 437 ### if we've already failed, return here 438 if not success or abort: 439 _ongoing_installations.remove(self.full_name) 440 return success, msg 441 442 ### attempt to install dependencies 443 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 444 if not dependencies_installed: 445 _ongoing_installations.remove(self.full_name) 446 return False, f"Failed to install dependencies for plugin '{self}'." 447 448 ### handling success tuple, bool, or other (typically None) 449 setup_tuple = self.setup(debug=debug) 450 if isinstance(setup_tuple, tuple): 451 if not setup_tuple[0]: 452 success, msg = setup_tuple 453 elif isinstance(setup_tuple, bool): 454 if not setup_tuple: 455 success, msg = False, ( 456 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 457 f"Check `setup()` in '{self.__file__}' for more information " + 458 "(no error message provided)." 459 ) 460 else: 461 success, msg = True, success_msg 462 elif setup_tuple is None: 463 success = True 464 msg = ( 465 f"Post-install for plugin '{self}' returned None. " + 466 "Assuming plugin successfully installed." 467 ) 468 warn(msg) 469 else: 470 success = False 471 msg = ( 472 f"Post-install for plugin '{self}' returned unexpected value " + 473 f"of type '{type(setup_tuple)}': {setup_tuple}" 474 ) 475 476 _ongoing_installations.remove(self.full_name) 477 _ = self.module 478 return success, msg 479 480 481 def remove_archive( 482 self, 483 debug: bool = False 484 ) -> SuccessTuple: 485 """Remove a plugin's archive file.""" 486 if not self.archive_path.exists(): 487 return True, f"Archive file for plugin '{self}' does not exist." 488 try: 489 self.archive_path.unlink() 490 except Exception as e: 491 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 492 return True, "Success" 493 494 495 def remove_venv( 496 self, 497 debug: bool = False 498 ) -> SuccessTuple: 499 """Remove a plugin's virtual environment.""" 500 if not self.venv_path.exists(): 501 return True, f"Virtual environment for plugin '{self}' does not exist." 502 try: 503 shutil.rmtree(self.venv_path) 504 except Exception as e: 505 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 506 return True, "Success" 507 508 509 def uninstall(self, debug: bool = False) -> SuccessTuple: 510 """ 511 Remove a plugin, its virtual environment, and archive file. 512 """ 513 from meerschaum.utils.packages import reload_meerschaum 514 from meerschaum.plugins import sync_plugins_symlinks 515 from meerschaum.utils.warnings import warn, info 516 warnings_thrown_count: int = 0 517 max_warnings: int = 3 518 519 if not self.is_installed(): 520 info( 521 f"Plugin '{self.name}' doesn't seem to be installed.\n " 522 + "Checking for artifacts...", 523 stack = False, 524 ) 525 else: 526 real_path = pathlib.Path(os.path.realpath(self.__file__)) 527 try: 528 if real_path.name == '__init__.py': 529 shutil.rmtree(real_path.parent) 530 else: 531 real_path.unlink() 532 except Exception as e: 533 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 534 warnings_thrown_count += 1 535 else: 536 info(f"Removed source files for plugin '{self.name}'.") 537 538 if self.venv_path.exists(): 539 success, msg = self.remove_venv(debug=debug) 540 if not success: 541 warn(msg, stack=False) 542 warnings_thrown_count += 1 543 else: 544 info(f"Removed virtual environment from plugin '{self.name}'.") 545 546 success = warnings_thrown_count < max_warnings 547 sync_plugins_symlinks(debug=debug) 548 self.deactivate_venv(force=True, debug=debug) 549 reload_meerschaum(debug=debug) 550 return success, ( 551 f"Successfully uninstalled plugin '{self}'." if success 552 else f"Failed to uninstall plugin '{self}'." 553 ) 554 555 556 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 557 """ 558 If exists, run the plugin's `setup()` function. 559 560 Parameters 561 ---------- 562 *args: str 563 The positional arguments passed to the `setup()` function. 564 565 debug: bool, default False 566 Verbosity toggle. 567 568 **kw: Any 569 The keyword arguments passed to the `setup()` function. 570 571 Returns 572 ------- 573 A `SuccessTuple` or `bool` indicating success. 574 575 """ 576 from meerschaum.utils.debug import dprint 577 import inspect 578 _setup = None 579 for name, fp in inspect.getmembers(self.module): 580 if name == 'setup' and inspect.isfunction(fp): 581 _setup = fp 582 break 583 584 ### assume success if no setup() is found (not necessary) 585 if _setup is None: 586 return True 587 588 sig = inspect.signature(_setup) 589 has_debug, has_kw = ('debug' in sig.parameters), False 590 for k, v in sig.parameters.items(): 591 if '**' in str(v): 592 has_kw = True 593 break 594 595 _kw = {} 596 if has_kw: 597 _kw.update(kw) 598 if has_debug: 599 _kw['debug'] = debug 600 601 if debug: 602 dprint(f"Running setup for plugin '{self}'...") 603 try: 604 self.activate_venv(debug=debug) 605 return_tuple = _setup(*args, **_kw) 606 self.deactivate_venv(debug=debug) 607 except Exception as e: 608 return False, str(e) 609 610 if isinstance(return_tuple, tuple): 611 return return_tuple 612 if isinstance(return_tuple, bool): 613 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 614 if return_tuple is None: 615 return False, f"Setup for Plugin '{self.name}' returned None." 616 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}" 617 618 619 def get_dependencies( 620 self, 621 debug: bool = False, 622 ) -> List[str]: 623 """ 624 If the Plugin has specified dependencies in a list called `required`, return the list. 625 626 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 627 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 628 629 Parameters 630 ---------- 631 debug: bool, default False 632 Verbosity toggle. 633 634 Returns 635 ------- 636 A list of required packages and plugins (str). 637 638 """ 639 if '_required' in self.__dict__: 640 return self._required 641 642 ### If the plugin has not yet been imported, 643 ### infer the dependencies from the source text. 644 ### This is not super robust, and it doesn't feel right 645 ### having multiple versions of the logic. 646 ### This is necessary when determining the activation order 647 ### without having import the module. 648 ### For consistency's sake, the module-less method does not cache the requirements. 649 if self.__dict__.get('_module', None) is None: 650 file_path = self.__file__ 651 if file_path is None: 652 return [] 653 with open(file_path, 'r', encoding='utf-8') as f: 654 text = f.read() 655 656 if 'required' not in text: 657 return [] 658 659 ### This has some limitations: 660 ### It relies on `required` being manually declared. 661 ### We lose the ability to dynamically alter the `required` list, 662 ### which is why we've kept the module-reliant method below. 663 import ast, re 664 ### NOTE: This technically would break 665 ### if `required` was the very first line of the file. 666 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 667 if not req_start_match: 668 return [] 669 req_start = req_start_match.start() 670 equals_sign = req_start + text[req_start:].find('=') 671 672 ### Dependencies may have brackets within the strings, so push back the index. 673 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 674 if first_opening_brace == -1: 675 return [] 676 677 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 678 if next_closing_brace == -1: 679 return [] 680 681 start_ix = first_opening_brace + 1 682 end_ix = next_closing_brace 683 684 num_braces = 0 685 while True: 686 if '[' not in text[start_ix:end_ix]: 687 break 688 num_braces += 1 689 start_ix = end_ix 690 end_ix += text[end_ix + 1:].find(']') + 1 691 692 req_end = end_ix + 1 693 req_text = ( 694 text[(first_opening_brace-1):req_end] 695 .lstrip() 696 .replace('=', '', 1) 697 .lstrip() 698 .rstrip() 699 ) 700 try: 701 required = ast.literal_eval(req_text) 702 except Exception as e: 703 warn( 704 f"Unable to determine requirements for plugin '{self.name}' " 705 + "without importing the module.\n" 706 + " This may be due to dynamically setting the global `required` list.\n" 707 + f" {e}" 708 ) 709 return [] 710 return required 711 712 import inspect 713 self.activate_venv(dependencies=False, debug=debug) 714 required = [] 715 for name, val in inspect.getmembers(self.module): 716 if name == 'required': 717 required = val 718 break 719 self._required = required 720 self.deactivate_venv(dependencies=False, debug=debug) 721 return required 722 723 724 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 725 """ 726 Return a list of required Plugin objects. 727 """ 728 from meerschaum.utils.warnings import warn 729 from meerschaum.config import get_config 730 from meerschaum._internal.static import STATIC_CONFIG 731 from meerschaum.connectors.parse import is_valid_connector_keys 732 plugins = [] 733 _deps = self.get_dependencies(debug=debug) 734 sep = STATIC_CONFIG['plugins']['repo_separator'] 735 plugin_names = [ 736 _d[len('plugin:'):] for _d in _deps 737 if _d.startswith('plugin:') and len(_d) > len('plugin:') 738 ] 739 default_repo_keys = get_config('meerschaum', 'repository') 740 skipped_repo_keys = set() 741 742 for _plugin_name in plugin_names: 743 if sep in _plugin_name: 744 try: 745 _plugin_name, _repo_keys = _plugin_name.split(sep) 746 except Exception: 747 _repo_keys = default_repo_keys 748 warn( 749 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 750 + f"Will try to use '{_repo_keys}' instead.", 751 stack = False, 752 ) 753 else: 754 _repo_keys = default_repo_keys 755 756 if _repo_keys in skipped_repo_keys: 757 continue 758 759 if not is_valid_connector_keys(_repo_keys): 760 warn( 761 f"Invalid connector '{_repo_keys}'.\n" 762 f" Skipping required plugins from repository '{_repo_keys}'", 763 stack=False, 764 ) 765 continue 766 767 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 768 769 return plugins 770 771 772 def get_required_packages(self, debug: bool=False) -> List[str]: 773 """ 774 Return the required package names (excluding plugins). 775 """ 776 _deps = self.get_dependencies(debug=debug) 777 return [_d for _d in _deps if not _d.startswith('plugin:')] 778 779 780 def activate_venv( 781 self, 782 dependencies: bool = True, 783 init_if_not_exists: bool = True, 784 debug: bool = False, 785 **kw 786 ) -> bool: 787 """ 788 Activate the virtual environments for the plugin and its dependencies. 789 790 Parameters 791 ---------- 792 dependencies: bool, default True 793 If `True`, activate the virtual environments for required plugins. 794 795 Returns 796 ------- 797 A bool indicating success. 798 """ 799 from meerschaum.utils.venv import venv_target_path 800 from meerschaum.utils.packages import activate_venv 801 from meerschaum.utils.misc import make_symlink, is_symlink 802 from meerschaum.config._paths import PACKAGE_ROOT_PATH 803 804 if dependencies: 805 for plugin in self.get_required_plugins(debug=debug): 806 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 807 808 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 809 venv_meerschaum_path = vtp / 'meerschaum' 810 811 try: 812 success, msg = True, "Success" 813 if is_symlink(venv_meerschaum_path): 814 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH: 815 venv_meerschaum_path.unlink() 816 success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH) 817 except Exception as e: 818 success, msg = False, str(e) 819 if not success: 820 warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}") 821 822 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw) 823 824 825 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 826 """ 827 Deactivate the virtual environments for the plugin and its dependencies. 828 829 Parameters 830 ---------- 831 dependencies: bool, default True 832 If `True`, deactivate the virtual environments for required plugins. 833 834 Returns 835 ------- 836 A bool indicating success. 837 """ 838 from meerschaum.utils.packages import deactivate_venv 839 success = deactivate_venv(self.name, debug=debug, **kw) 840 if dependencies: 841 for plugin in self.get_required_plugins(debug=debug): 842 plugin.deactivate_venv(debug=debug, **kw) 843 return success 844 845 846 def install_dependencies( 847 self, 848 force: bool = False, 849 debug: bool = False, 850 ) -> bool: 851 """ 852 If specified, install dependencies. 853 854 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 855 Meerschaum plugins from the same repository as this Plugin. 856 To install from a different repository, add the repo keys after `'@'` 857 (e.g. `'plugin:foo@api:bar'`). 858 859 Parameters 860 ---------- 861 force: bool, default False 862 If `True`, continue with the installation, even if some 863 required packages fail to install. 864 865 debug: bool, default False 866 Verbosity toggle. 867 868 Returns 869 ------- 870 A bool indicating success. 871 """ 872 from meerschaum.utils.packages import pip_install, venv_contains_package 873 from meerschaum.utils.warnings import warn, info 874 _deps = self.get_dependencies(debug=debug) 875 if not _deps and self.requirements_file_path is None: 876 return True 877 878 plugins = self.get_required_plugins(debug=debug) 879 for _plugin in plugins: 880 if _plugin.name == self.name: 881 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 882 continue 883 _success, _msg = _plugin.repo_connector.install_plugin( 884 _plugin.name, debug=debug, force=force 885 ) 886 if not _success: 887 warn( 888 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 889 + f" for plugin '{self.name}':\n" + _msg, 890 stack = False, 891 ) 892 if not force: 893 warn( 894 "Try installing with the `--force` flag to continue anyway.", 895 stack = False, 896 ) 897 return False 898 info( 899 "Continuing with installation despite the failure " 900 + "(careful, things might be broken!)...", 901 icon = False 902 ) 903 904 905 ### First step: parse `requirements.txt` if it exists. 906 if self.requirements_file_path is not None: 907 if not pip_install( 908 requirements_file_path=self.requirements_file_path, 909 venv=self.name, debug=debug 910 ): 911 warn( 912 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 913 stack = False, 914 ) 915 if not force: 916 warn( 917 "Try installing with `--force` to continue anyway.", 918 stack = False, 919 ) 920 return False 921 info( 922 "Continuing with installation despite the failure " 923 + "(careful, things might be broken!)...", 924 icon = False 925 ) 926 927 928 ### Don't reinstall packages that are already included in required plugins. 929 packages = [] 930 _packages = self.get_required_packages(debug=debug) 931 accounted_for_packages = set() 932 for package_name in _packages: 933 for plugin in plugins: 934 if venv_contains_package(package_name, plugin.name): 935 accounted_for_packages.add(package_name) 936 break 937 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 938 939 ### Attempt pip packages installation. 940 if packages: 941 for package in packages: 942 if not pip_install(package, venv=self.name, debug=debug): 943 warn( 944 f"Failed to install required package '{package}'" 945 + f" for plugin '{self.name}'.", 946 stack = False, 947 ) 948 if not force: 949 warn( 950 "Try installing with `--force` to continue anyway.", 951 stack = False, 952 ) 953 return False 954 info( 955 "Continuing with installation despite the failure " 956 + "(careful, things might be broken!)...", 957 icon = False 958 ) 959 return True 960 961 962 @property 963 def full_name(self) -> str: 964 """ 965 Include the repo keys with the plugin's name. 966 """ 967 from meerschaum._internal.static import STATIC_CONFIG 968 sep = STATIC_CONFIG['plugins']['repo_separator'] 969 return self.name + sep + str(self.repo_connector) 970 971 972 def __str__(self): 973 return self.name 974 975 976 def __repr__(self): 977 return f"Plugin('{self.name}', repo='{self.repo_connector}')" 978 979 980 def __del__(self): 981 pass
Handle packaging of Meerschaum plugins.
33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 from meerschaum._internal.static import STATIC_CONFIG 46 from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo
76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector
Return the repository connector for this plugin.
NOTE: This imports the connectors
module, which imports certain plugin modules.
95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version
Return the plugin's module version is defined (__version__
) if it's defined.
108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module
Return the Python module of the underlying plugin.
148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path
If a file named requirements.txt
exists, return its path.
161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None
Check whether a plugin is correctly installed.
Returns
- A
bool
indicating whether a plugin exists and is successfully imported.
172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path
Compress the plugin's source files into a .tar.gz
archive and return the archive's path.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
pathlib.Path
to the archive file's path.
257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 from meerschaum.utils.warnings import warn, error 289 if debug: 290 from meerschaum.utils.debug import dprint 291 import tarfile 292 import re 293 import ast 294 from meerschaum.plugins import sync_plugins_symlinks 295 from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum 296 from meerschaum.utils.venv import init_venv 297 from meerschaum.utils.misc import safely_extract_tar 298 from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS 299 old_cwd = os.getcwd() 300 old_version = '' 301 new_version = '' 302 temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name 303 temp_dir.mkdir(exist_ok=True) 304 305 if not self.archive_path.exists(): 306 return False, f"Missing archive file for plugin '{self}'." 307 if self.version is not None: 308 old_version = self.version 309 if debug: 310 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 311 312 if debug: 313 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 314 315 try: 316 with tarfile.open(self.archive_path, 'r:gz') as tarf: 317 safely_extract_tar(tarf, temp_dir) 318 except Exception as e: 319 warn(e) 320 return False, f"Failed to extract plugin '{self.name}'." 321 322 ### search for version information 323 files = os.listdir(temp_dir) 324 325 if str(files[0]) == self.name: 326 is_dir = True 327 elif str(files[0]) == self.name + '.py': 328 is_dir = False 329 else: 330 error(f"Unknown format encountered for plugin '{self}'.") 331 332 fpath = temp_dir / files[0] 333 if is_dir: 334 fpath = fpath / '__init__.py' 335 336 init_venv(self.name, debug=debug) 337 with open(fpath, 'r', encoding='utf-8') as f: 338 init_lines = f.readlines() 339 new_version = None 340 for line in init_lines: 341 if '__version__' not in line: 342 continue 343 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 344 if not version_match: 345 continue 346 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 347 break 348 if not new_version: 349 warn( 350 f"No `__version__` defined for plugin '{self}'. " 351 + "Assuming new version...", 352 stack = False, 353 ) 354 355 packaging_version = attempt_import('packaging.version') 356 try: 357 is_new_version = (not new_version and not old_version) or ( 358 packaging_version.parse(old_version) < packaging_version.parse(new_version) 359 ) 360 is_same_version = new_version and old_version and ( 361 packaging_version.parse(old_version) == packaging_version.parse(new_version) 362 ) 363 except Exception: 364 is_new_version, is_same_version = True, False 365 366 ### Determine where to permanently store the new plugin. 367 plugin_installation_dir_path = PLUGINS_DIR_PATHS[0] 368 for path in PLUGINS_DIR_PATHS: 369 if not path.exists(): 370 warn(f"Plugins path does not exist: {path}", stack=False) 371 continue 372 373 files_in_plugins_dir = os.listdir(path) 374 if ( 375 self.name in files_in_plugins_dir 376 or 377 (self.name + '.py') in files_in_plugins_dir 378 ): 379 plugin_installation_dir_path = path 380 break 381 382 success_msg = ( 383 f"Successfully installed plugin '{self}'" 384 + ("\n (skipped dependencies)" if skip_deps else "") 385 + "." 386 ) 387 success, abort = None, None 388 389 if is_same_version and not force: 390 success, msg = True, ( 391 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 392 " Install again with `-f` or `--force` to reinstall." 393 ) 394 abort = True 395 elif is_new_version or force: 396 for src_dir, dirs, files in os.walk(temp_dir): 397 if success is not None: 398 break 399 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 400 if not os.path.exists(dst_dir): 401 os.mkdir(dst_dir) 402 for f in files: 403 src_file = os.path.join(src_dir, f) 404 dst_file = os.path.join(dst_dir, f) 405 if os.path.exists(dst_file): 406 os.remove(dst_file) 407 408 if debug: 409 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 410 try: 411 shutil.move(src_file, dst_dir) 412 except Exception: 413 success, msg = False, ( 414 f"Failed to install plugin '{self}': " + 415 f"Could not move file '{src_file}' to '{dst_dir}'" 416 ) 417 print(msg) 418 break 419 if success is None: 420 success, msg = True, success_msg 421 else: 422 success, msg = False, ( 423 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 424 + f"attempted version {new_version}." 425 ) 426 427 shutil.rmtree(temp_dir) 428 os.chdir(old_cwd) 429 430 ### Reload the plugin's module. 431 sync_plugins_symlinks(debug=debug) 432 if '_module' in self.__dict__: 433 del self.__dict__['_module'] 434 init_venv(venv=self.name, force=True, debug=debug) 435 reload_meerschaum(debug=debug) 436 437 ### if we've already failed, return here 438 if not success or abort: 439 _ongoing_installations.remove(self.full_name) 440 return success, msg 441 442 ### attempt to install dependencies 443 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 444 if not dependencies_installed: 445 _ongoing_installations.remove(self.full_name) 446 return False, f"Failed to install dependencies for plugin '{self}'." 447 448 ### handling success tuple, bool, or other (typically None) 449 setup_tuple = self.setup(debug=debug) 450 if isinstance(setup_tuple, tuple): 451 if not setup_tuple[0]: 452 success, msg = setup_tuple 453 elif isinstance(setup_tuple, bool): 454 if not setup_tuple: 455 success, msg = False, ( 456 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 457 f"Check `setup()` in '{self.__file__}' for more information " + 458 "(no error message provided)." 459 ) 460 else: 461 success, msg = True, success_msg 462 elif setup_tuple is None: 463 success = True 464 msg = ( 465 f"Post-install for plugin '{self}' returned None. " + 466 "Assuming plugin successfully installed." 467 ) 468 warn(msg) 469 else: 470 success = False 471 msg = ( 472 f"Post-install for plugin '{self}' returned unexpected value " + 473 f"of type '{type(setup_tuple)}': {setup_tuple}" 474 ) 475 476 _ongoing_installations.remove(self.full_name) 477 _ = self.module 478 return success, msg
Extract a plugin's tar archive to the plugins directory.
This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.
Parameters
- skip_deps (bool, default False):
If
True
, do not install dependencies. - force (bool, default False):
If
True
, continue with installation, even if required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTuple
of success (bool) and a message (str).
481 def remove_archive( 482 self, 483 debug: bool = False 484 ) -> SuccessTuple: 485 """Remove a plugin's archive file.""" 486 if not self.archive_path.exists(): 487 return True, f"Archive file for plugin '{self}' does not exist." 488 try: 489 self.archive_path.unlink() 490 except Exception as e: 491 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 492 return True, "Success"
Remove a plugin's archive file.
495 def remove_venv( 496 self, 497 debug: bool = False 498 ) -> SuccessTuple: 499 """Remove a plugin's virtual environment.""" 500 if not self.venv_path.exists(): 501 return True, f"Virtual environment for plugin '{self}' does not exist." 502 try: 503 shutil.rmtree(self.venv_path) 504 except Exception as e: 505 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 506 return True, "Success"
Remove a plugin's virtual environment.
509 def uninstall(self, debug: bool = False) -> SuccessTuple: 510 """ 511 Remove a plugin, its virtual environment, and archive file. 512 """ 513 from meerschaum.utils.packages import reload_meerschaum 514 from meerschaum.plugins import sync_plugins_symlinks 515 from meerschaum.utils.warnings import warn, info 516 warnings_thrown_count: int = 0 517 max_warnings: int = 3 518 519 if not self.is_installed(): 520 info( 521 f"Plugin '{self.name}' doesn't seem to be installed.\n " 522 + "Checking for artifacts...", 523 stack = False, 524 ) 525 else: 526 real_path = pathlib.Path(os.path.realpath(self.__file__)) 527 try: 528 if real_path.name == '__init__.py': 529 shutil.rmtree(real_path.parent) 530 else: 531 real_path.unlink() 532 except Exception as e: 533 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 534 warnings_thrown_count += 1 535 else: 536 info(f"Removed source files for plugin '{self.name}'.") 537 538 if self.venv_path.exists(): 539 success, msg = self.remove_venv(debug=debug) 540 if not success: 541 warn(msg, stack=False) 542 warnings_thrown_count += 1 543 else: 544 info(f"Removed virtual environment from plugin '{self.name}'.") 545 546 success = warnings_thrown_count < max_warnings 547 sync_plugins_symlinks(debug=debug) 548 self.deactivate_venv(force=True, debug=debug) 549 reload_meerschaum(debug=debug) 550 return success, ( 551 f"Successfully uninstalled plugin '{self}'." if success 552 else f"Failed to uninstall plugin '{self}'." 553 )
Remove a plugin, its virtual environment, and archive file.
556 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 557 """ 558 If exists, run the plugin's `setup()` function. 559 560 Parameters 561 ---------- 562 *args: str 563 The positional arguments passed to the `setup()` function. 564 565 debug: bool, default False 566 Verbosity toggle. 567 568 **kw: Any 569 The keyword arguments passed to the `setup()` function. 570 571 Returns 572 ------- 573 A `SuccessTuple` or `bool` indicating success. 574 575 """ 576 from meerschaum.utils.debug import dprint 577 import inspect 578 _setup = None 579 for name, fp in inspect.getmembers(self.module): 580 if name == 'setup' and inspect.isfunction(fp): 581 _setup = fp 582 break 583 584 ### assume success if no setup() is found (not necessary) 585 if _setup is None: 586 return True 587 588 sig = inspect.signature(_setup) 589 has_debug, has_kw = ('debug' in sig.parameters), False 590 for k, v in sig.parameters.items(): 591 if '**' in str(v): 592 has_kw = True 593 break 594 595 _kw = {} 596 if has_kw: 597 _kw.update(kw) 598 if has_debug: 599 _kw['debug'] = debug 600 601 if debug: 602 dprint(f"Running setup for plugin '{self}'...") 603 try: 604 self.activate_venv(debug=debug) 605 return_tuple = _setup(*args, **_kw) 606 self.deactivate_venv(debug=debug) 607 except Exception as e: 608 return False, str(e) 609 610 if isinstance(return_tuple, tuple): 611 return return_tuple 612 if isinstance(return_tuple, bool): 613 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 614 if return_tuple is None: 615 return False, f"Setup for Plugin '{self.name}' returned None." 616 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
If exists, run the plugin's setup()
function.
Parameters
- *args (str):
The positional arguments passed to the
setup()
function. - debug (bool, default False): Verbosity toggle.
- **kw (Any):
The keyword arguments passed to the
setup()
function.
Returns
- A
SuccessTuple
orbool
indicating success.
619 def get_dependencies( 620 self, 621 debug: bool = False, 622 ) -> List[str]: 623 """ 624 If the Plugin has specified dependencies in a list called `required`, return the list. 625 626 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 627 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 628 629 Parameters 630 ---------- 631 debug: bool, default False 632 Verbosity toggle. 633 634 Returns 635 ------- 636 A list of required packages and plugins (str). 637 638 """ 639 if '_required' in self.__dict__: 640 return self._required 641 642 ### If the plugin has not yet been imported, 643 ### infer the dependencies from the source text. 644 ### This is not super robust, and it doesn't feel right 645 ### having multiple versions of the logic. 646 ### This is necessary when determining the activation order 647 ### without having import the module. 648 ### For consistency's sake, the module-less method does not cache the requirements. 649 if self.__dict__.get('_module', None) is None: 650 file_path = self.__file__ 651 if file_path is None: 652 return [] 653 with open(file_path, 'r', encoding='utf-8') as f: 654 text = f.read() 655 656 if 'required' not in text: 657 return [] 658 659 ### This has some limitations: 660 ### It relies on `required` being manually declared. 661 ### We lose the ability to dynamically alter the `required` list, 662 ### which is why we've kept the module-reliant method below. 663 import ast, re 664 ### NOTE: This technically would break 665 ### if `required` was the very first line of the file. 666 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 667 if not req_start_match: 668 return [] 669 req_start = req_start_match.start() 670 equals_sign = req_start + text[req_start:].find('=') 671 672 ### Dependencies may have brackets within the strings, so push back the index. 673 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 674 if first_opening_brace == -1: 675 return [] 676 677 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 678 if next_closing_brace == -1: 679 return [] 680 681 start_ix = first_opening_brace + 1 682 end_ix = next_closing_brace 683 684 num_braces = 0 685 while True: 686 if '[' not in text[start_ix:end_ix]: 687 break 688 num_braces += 1 689 start_ix = end_ix 690 end_ix += text[end_ix + 1:].find(']') + 1 691 692 req_end = end_ix + 1 693 req_text = ( 694 text[(first_opening_brace-1):req_end] 695 .lstrip() 696 .replace('=', '', 1) 697 .lstrip() 698 .rstrip() 699 ) 700 try: 701 required = ast.literal_eval(req_text) 702 except Exception as e: 703 warn( 704 f"Unable to determine requirements for plugin '{self.name}' " 705 + "without importing the module.\n" 706 + " This may be due to dynamically setting the global `required` list.\n" 707 + f" {e}" 708 ) 709 return [] 710 return required 711 712 import inspect 713 self.activate_venv(dependencies=False, debug=debug) 714 required = [] 715 for name, val in inspect.getmembers(self.module): 716 if name == 'required': 717 required = val 718 break 719 self._required = required 720 self.deactivate_venv(dependencies=False, debug=debug) 721 return required
If the Plugin has specified dependencies in a list called required
, return the list.
NOTE: Dependecies which start with 'plugin:'
are Meerschaum plugins, not pip packages.
Meerschaum plugins may also specify connector keys for a repo after '@'
.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A list of required packages and plugins (str).
724 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 725 """ 726 Return a list of required Plugin objects. 727 """ 728 from meerschaum.utils.warnings import warn 729 from meerschaum.config import get_config 730 from meerschaum._internal.static import STATIC_CONFIG 731 from meerschaum.connectors.parse import is_valid_connector_keys 732 plugins = [] 733 _deps = self.get_dependencies(debug=debug) 734 sep = STATIC_CONFIG['plugins']['repo_separator'] 735 plugin_names = [ 736 _d[len('plugin:'):] for _d in _deps 737 if _d.startswith('plugin:') and len(_d) > len('plugin:') 738 ] 739 default_repo_keys = get_config('meerschaum', 'repository') 740 skipped_repo_keys = set() 741 742 for _plugin_name in plugin_names: 743 if sep in _plugin_name: 744 try: 745 _plugin_name, _repo_keys = _plugin_name.split(sep) 746 except Exception: 747 _repo_keys = default_repo_keys 748 warn( 749 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 750 + f"Will try to use '{_repo_keys}' instead.", 751 stack = False, 752 ) 753 else: 754 _repo_keys = default_repo_keys 755 756 if _repo_keys in skipped_repo_keys: 757 continue 758 759 if not is_valid_connector_keys(_repo_keys): 760 warn( 761 f"Invalid connector '{_repo_keys}'.\n" 762 f" Skipping required plugins from repository '{_repo_keys}'", 763 stack=False, 764 ) 765 continue 766 767 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 768 769 return plugins
Return a list of required Plugin objects.
772 def get_required_packages(self, debug: bool=False) -> List[str]: 773 """ 774 Return the required package names (excluding plugins). 775 """ 776 _deps = self.get_dependencies(debug=debug) 777 return [_d for _d in _deps if not _d.startswith('plugin:')]
Return the required package names (excluding plugins).
780 def activate_venv( 781 self, 782 dependencies: bool = True, 783 init_if_not_exists: bool = True, 784 debug: bool = False, 785 **kw 786 ) -> bool: 787 """ 788 Activate the virtual environments for the plugin and its dependencies. 789 790 Parameters 791 ---------- 792 dependencies: bool, default True 793 If `True`, activate the virtual environments for required plugins. 794 795 Returns 796 ------- 797 A bool indicating success. 798 """ 799 from meerschaum.utils.venv import venv_target_path 800 from meerschaum.utils.packages import activate_venv 801 from meerschaum.utils.misc import make_symlink, is_symlink 802 from meerschaum.config._paths import PACKAGE_ROOT_PATH 803 804 if dependencies: 805 for plugin in self.get_required_plugins(debug=debug): 806 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 807 808 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 809 venv_meerschaum_path = vtp / 'meerschaum' 810 811 try: 812 success, msg = True, "Success" 813 if is_symlink(venv_meerschaum_path): 814 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH: 815 venv_meerschaum_path.unlink() 816 success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH) 817 except Exception as e: 818 success, msg = False, str(e) 819 if not success: 820 warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}") 821 822 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
Activate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True
, activate the virtual environments for required plugins.
Returns
- A bool indicating success.
825 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 826 """ 827 Deactivate the virtual environments for the plugin and its dependencies. 828 829 Parameters 830 ---------- 831 dependencies: bool, default True 832 If `True`, deactivate the virtual environments for required plugins. 833 834 Returns 835 ------- 836 A bool indicating success. 837 """ 838 from meerschaum.utils.packages import deactivate_venv 839 success = deactivate_venv(self.name, debug=debug, **kw) 840 if dependencies: 841 for plugin in self.get_required_plugins(debug=debug): 842 plugin.deactivate_venv(debug=debug, **kw) 843 return success
Deactivate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True
, deactivate the virtual environments for required plugins.
Returns
- A bool indicating success.
846 def install_dependencies( 847 self, 848 force: bool = False, 849 debug: bool = False, 850 ) -> bool: 851 """ 852 If specified, install dependencies. 853 854 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 855 Meerschaum plugins from the same repository as this Plugin. 856 To install from a different repository, add the repo keys after `'@'` 857 (e.g. `'plugin:foo@api:bar'`). 858 859 Parameters 860 ---------- 861 force: bool, default False 862 If `True`, continue with the installation, even if some 863 required packages fail to install. 864 865 debug: bool, default False 866 Verbosity toggle. 867 868 Returns 869 ------- 870 A bool indicating success. 871 """ 872 from meerschaum.utils.packages import pip_install, venv_contains_package 873 from meerschaum.utils.warnings import warn, info 874 _deps = self.get_dependencies(debug=debug) 875 if not _deps and self.requirements_file_path is None: 876 return True 877 878 plugins = self.get_required_plugins(debug=debug) 879 for _plugin in plugins: 880 if _plugin.name == self.name: 881 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 882 continue 883 _success, _msg = _plugin.repo_connector.install_plugin( 884 _plugin.name, debug=debug, force=force 885 ) 886 if not _success: 887 warn( 888 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 889 + f" for plugin '{self.name}':\n" + _msg, 890 stack = False, 891 ) 892 if not force: 893 warn( 894 "Try installing with the `--force` flag to continue anyway.", 895 stack = False, 896 ) 897 return False 898 info( 899 "Continuing with installation despite the failure " 900 + "(careful, things might be broken!)...", 901 icon = False 902 ) 903 904 905 ### First step: parse `requirements.txt` if it exists. 906 if self.requirements_file_path is not None: 907 if not pip_install( 908 requirements_file_path=self.requirements_file_path, 909 venv=self.name, debug=debug 910 ): 911 warn( 912 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 913 stack = False, 914 ) 915 if not force: 916 warn( 917 "Try installing with `--force` to continue anyway.", 918 stack = False, 919 ) 920 return False 921 info( 922 "Continuing with installation despite the failure " 923 + "(careful, things might be broken!)...", 924 icon = False 925 ) 926 927 928 ### Don't reinstall packages that are already included in required plugins. 929 packages = [] 930 _packages = self.get_required_packages(debug=debug) 931 accounted_for_packages = set() 932 for package_name in _packages: 933 for plugin in plugins: 934 if venv_contains_package(package_name, plugin.name): 935 accounted_for_packages.add(package_name) 936 break 937 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 938 939 ### Attempt pip packages installation. 940 if packages: 941 for package in packages: 942 if not pip_install(package, venv=self.name, debug=debug): 943 warn( 944 f"Failed to install required package '{package}'" 945 + f" for plugin '{self.name}'.", 946 stack = False, 947 ) 948 if not force: 949 warn( 950 "Try installing with `--force` to continue anyway.", 951 stack = False, 952 ) 953 return False 954 info( 955 "Continuing with installation despite the failure " 956 + "(careful, things might be broken!)...", 957 icon = False 958 ) 959 return True
If specified, install dependencies.
NOTE: Dependencies that start with 'plugin:'
will be installed as
Meerschaum plugins from the same repository as this Plugin.
To install from a different repository, add the repo keys after '@'
(e.g. 'plugin:foo@api:bar'
).
Parameters
- force (bool, default False):
If
True
, continue with the installation, even if some required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A bool indicating success.
962 @property 963 def full_name(self) -> str: 964 """ 965 Include the repo keys with the plugin's name. 966 """ 967 from meerschaum._internal.static import STATIC_CONFIG 968 sep = STATIC_CONFIG['plugins']['repo_separator'] 969 return self.name + sep + str(self.repo_connector)
Include the repo keys with the plugin's name.
19class Venv: 20 """ 21 Manage a virtual enviroment's activation status. 22 23 Examples 24 -------- 25 >>> from meerschaum.plugins import Plugin 26 >>> with Venv('mrsm') as venv: 27 ... import pandas 28 >>> with Venv(Plugin('noaa')) as venv: 29 ... import requests 30 >>> venv = Venv('mrsm') 31 >>> venv.activate() 32 True 33 >>> venv.deactivate() 34 True 35 >>> 36 """ 37 38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 61 62 63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 ) 86 87 88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs) 95 96 97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug) 106 107 108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 from meerschaum.config._paths import VIRTENV_RESOURCES_PATH 114 if self._venv is None: 115 return self.target_path.parent 116 return VIRTENV_RESOURCES_PATH / self._venv 117 118 119 def __enter__(self) -> None: 120 self.activate(debug=self._debug) 121 122 123 def __exit__(self, exc_type, exc_value, exc_traceback) -> None: 124 self.deactivate(debug=self._debug) 125 126 127 def __str__(self) -> str: 128 quote = "'" if self._venv is not None else "" 129 return "Venv(" + quote + str(self._venv) + quote + ")" 130 131 132 def __repr__(self) -> str: 133 return self.__str__()
Manage a virtual enviroment's activation status.
Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
... import pandas
>>> with Venv(Plugin('noaa')) as venv:
... import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 )
Activate this virtual environment.
If a meerschaum.plugins.Plugin
was provided, its dependent virtual environments
will also be activated.
88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs)
Deactivate this virtual environment.
If a meerschaum.plugins.Plugin
was provided, its dependent virtual environments
will also be deactivated.
97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
Return the target site-packages path for this virtual environment.
A meerschaum.utils.venv.Venv
may have one virtual environment per minor Python version
(e.g. Python 3.10 and Python 3.7).
108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 from meerschaum.config._paths import VIRTENV_RESOURCES_PATH 114 if self._venv is None: 115 return self.target_path.parent 116 return VIRTENV_RESOURCES_PATH / self._venv
Return the top-level path for this virtual environment.
70class Job: 71 """ 72 Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API. 73 """ 74 75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break 202 203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 from meerschaum.config.paths import DAEMON_RESOURCES_PATH 217 218 psutil = mrsm.attempt_import('psutil') 219 try: 220 process = psutil.Process(pid) 221 except psutil.NoSuchProcess as e: 222 warn(f"Process with PID {pid} does not exist.", stack=False) 223 raise e 224 225 command_args = process.cmdline() 226 is_daemon = command_args[1] == '-c' 227 228 if is_daemon: 229 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 230 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 231 if root_dir is None: 232 from meerschaum.config.paths import ROOT_DIR_PATH 233 root_dir = ROOT_DIR_PATH 234 else: 235 root_dir = pathlib.Path(root_dir) 236 jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name 237 daemon_dir = jobs_dir / daemon_id 238 pid_file = daemon_dir / 'process.pid' 239 240 if pid_file.exists(): 241 with open(pid_file, 'r', encoding='utf-8') as f: 242 daemon_pid = int(f.read()) 243 244 if pid != daemon_pid: 245 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 246 else: 247 raise EnvironmentError(f"Is job '{daemon_id}' running?") 248 249 return Job(daemon_id, executor_keys=executor_keys) 250 251 from meerschaum._internal.arguments._parse_arguments import parse_arguments 252 from meerschaum.utils.daemon import get_new_daemon_name 253 254 mrsm_ix = 0 255 for i, arg in enumerate(command_args): 256 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 257 mrsm_ix = i 258 break 259 260 sysargs = command_args[mrsm_ix+1:] 261 kwargs = parse_arguments(sysargs) 262 name = kwargs.get('name', get_new_daemon_name()) 263 return Job(name, sysargs, executor_keys=executor_keys) 264 265 def start(self, debug: bool = False) -> SuccessTuple: 266 """ 267 Start the job's daemon. 268 """ 269 if self.executor is not None: 270 if not self.exists(debug=debug): 271 return self.executor.create_job( 272 self.name, 273 self.sysargs, 274 properties=self.daemon.properties, 275 debug=debug, 276 ) 277 return self.executor.start_job(self.name, debug=debug) 278 279 if self.is_running(): 280 return True, f"{self} is already running." 281 282 success, msg = self.daemon.run( 283 keep_daemon_output=(not self.delete_after_completion), 284 allow_dirty_run=True, 285 ) 286 if not success: 287 return success, msg 288 289 return success, f"Started {self}." 290 291 def stop( 292 self, 293 timeout_seconds: Union[int, float, None] = None, 294 debug: bool = False, 295 ) -> SuccessTuple: 296 """ 297 Stop the job's daemon. 298 """ 299 if self.executor is not None: 300 return self.executor.stop_job(self.name, debug=debug) 301 302 if self.daemon.status == 'stopped': 303 if not self.restart: 304 return True, f"{self} is not running." 305 elif self.stop_time is not None: 306 return True, f"{self} will not restart until manually started." 307 308 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 309 if quit_success: 310 return quit_success, f"Stopped {self}." 311 312 warn( 313 f"Failed to gracefully quit {self}.", 314 stack=False, 315 ) 316 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 317 if not kill_success: 318 return kill_success, kill_msg 319 320 return kill_success, f"Killed {self}." 321 322 def pause( 323 self, 324 timeout_seconds: Union[int, float, None] = None, 325 debug: bool = False, 326 ) -> SuccessTuple: 327 """ 328 Pause the job's daemon. 329 """ 330 if self.executor is not None: 331 return self.executor.pause_job(self.name, debug=debug) 332 333 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 334 if not pause_success: 335 return pause_success, pause_msg 336 337 return pause_success, f"Paused {self}." 338 339 def delete(self, debug: bool = False) -> SuccessTuple: 340 """ 341 Delete the job and its daemon. 342 """ 343 if self.executor is not None: 344 return self.executor.delete_job(self.name, debug=debug) 345 346 if self.is_running(): 347 stop_success, stop_msg = self.stop() 348 if not stop_success: 349 return stop_success, stop_msg 350 351 cleanup_success, cleanup_msg = self.daemon.cleanup() 352 if not cleanup_success: 353 return cleanup_success, cleanup_msg 354 355 _ = self.daemon._properties.pop('result', None) 356 return cleanup_success, f"Deleted {self}." 357 358 def is_running(self) -> bool: 359 """ 360 Determine whether the job's daemon is running. 361 """ 362 return self.status == 'running' 363 364 def exists(self, debug: bool = False) -> bool: 365 """ 366 Determine whether the job exists. 367 """ 368 if self.executor is not None: 369 return self.executor.get_job_exists(self.name, debug=debug) 370 371 return self.daemon.path.exists() 372 373 def get_logs(self) -> Union[str, None]: 374 """ 375 Return the output text of the job's daemon. 376 """ 377 if self.executor is not None: 378 return self.executor.get_logs(self.name) 379 380 return self.daemon.log_text 381 382 def monitor_logs( 383 self, 384 callback_function: Callable[[str], None] = _default_stdout_callback, 385 input_callback_function: Optional[Callable[[], str]] = None, 386 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 387 stop_event: Optional[asyncio.Event] = None, 388 stop_on_exit: bool = False, 389 strip_timestamps: bool = False, 390 accept_input: bool = True, 391 debug: bool = False, 392 _logs_path: Optional[pathlib.Path] = None, 393 _log=None, 394 _stdin_file=None, 395 _wait_if_stopped: bool = True, 396 ): 397 """ 398 Monitor the job's log files and execute a callback on new lines. 399 400 Parameters 401 ---------- 402 callback_function: Callable[[str], None], default partial(print, end='') 403 The callback to execute as new data comes in. 404 Defaults to printing the output directly to `stdout`. 405 406 input_callback_function: Optional[Callable[[], str]], default None 407 If provided, execute this callback when the daemon is blocking on stdin. 408 Defaults to `sys.stdin.readline()`. 409 410 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 411 If provided, execute this callback when the daemon stops. 412 The job's SuccessTuple will be passed to the callback. 413 414 stop_event: Optional[asyncio.Event], default None 415 If provided, stop monitoring when this event is set. 416 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 417 from within `callback_function` to stop monitoring. 418 419 stop_on_exit: bool, default False 420 If `True`, stop monitoring when the job stops. 421 422 strip_timestamps: bool, default False 423 If `True`, remove leading timestamps from lines. 424 425 accept_input: bool, default True 426 If `True`, accept input when the daemon blocks on stdin. 427 """ 428 if self.executor is not None: 429 self.executor.monitor_logs( 430 self.name, 431 callback_function, 432 input_callback_function=input_callback_function, 433 stop_callback_function=stop_callback_function, 434 stop_on_exit=stop_on_exit, 435 accept_input=accept_input, 436 strip_timestamps=strip_timestamps, 437 debug=debug, 438 ) 439 return 440 441 monitor_logs_coroutine = self.monitor_logs_async( 442 callback_function=callback_function, 443 input_callback_function=input_callback_function, 444 stop_callback_function=stop_callback_function, 445 stop_event=stop_event, 446 stop_on_exit=stop_on_exit, 447 strip_timestamps=strip_timestamps, 448 accept_input=accept_input, 449 debug=debug, 450 _logs_path=_logs_path, 451 _log=_log, 452 _stdin_file=_stdin_file, 453 _wait_if_stopped=_wait_if_stopped, 454 ) 455 return asyncio.run(monitor_logs_coroutine) 456 457 async def monitor_logs_async( 458 self, 459 callback_function: Callable[[str], None] = _default_stdout_callback, 460 input_callback_function: Optional[Callable[[], str]] = None, 461 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 462 stop_event: Optional[asyncio.Event] = None, 463 stop_on_exit: bool = False, 464 strip_timestamps: bool = False, 465 accept_input: bool = True, 466 debug: bool = False, 467 _logs_path: Optional[pathlib.Path] = None, 468 _log=None, 469 _stdin_file=None, 470 _wait_if_stopped: bool = True, 471 ): 472 """ 473 Monitor the job's log files and await a callback on new lines. 474 475 Parameters 476 ---------- 477 callback_function: Callable[[str], None], default _default_stdout_callback 478 The callback to execute as new data comes in. 479 Defaults to printing the output directly to `stdout`. 480 481 input_callback_function: Optional[Callable[[], str]], default None 482 If provided, execute this callback when the daemon is blocking on stdin. 483 Defaults to `sys.stdin.readline()`. 484 485 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 486 If provided, execute this callback when the daemon stops. 487 The job's SuccessTuple will be passed to the callback. 488 489 stop_event: Optional[asyncio.Event], default None 490 If provided, stop monitoring when this event is set. 491 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 492 from within `callback_function` to stop monitoring. 493 494 stop_on_exit: bool, default False 495 If `True`, stop monitoring when the job stops. 496 497 strip_timestamps: bool, default False 498 If `True`, remove leading timestamps from lines. 499 500 accept_input: bool, default True 501 If `True`, accept input when the daemon blocks on stdin. 502 """ 503 from meerschaum.utils.prompt import prompt 504 505 def default_input_callback_function(): 506 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 507 if prompt_kwargs: 508 answer = prompt(**prompt_kwargs) 509 return answer + '\n' 510 return sys.stdin.readline() 511 512 if input_callback_function is None: 513 input_callback_function = default_input_callback_function 514 515 if self.executor is not None: 516 await self.executor.monitor_logs_async( 517 self.name, 518 callback_function, 519 input_callback_function=input_callback_function, 520 stop_callback_function=stop_callback_function, 521 stop_on_exit=stop_on_exit, 522 strip_timestamps=strip_timestamps, 523 accept_input=accept_input, 524 debug=debug, 525 ) 526 return 527 528 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 529 530 events = { 531 'user': stop_event, 532 'stopped': asyncio.Event(), 533 'stop_token': asyncio.Event(), 534 'stop_exception': asyncio.Event(), 535 'stopped_timeout': asyncio.Event(), 536 } 537 combined_event = asyncio.Event() 538 emitted_text = False 539 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 540 541 async def check_job_status(): 542 if not stop_on_exit: 543 return 544 545 nonlocal emitted_text 546 547 sleep_time = 0.1 548 while sleep_time < 0.2: 549 if self.status == 'stopped': 550 if not emitted_text and _wait_if_stopped: 551 await asyncio.sleep(sleep_time) 552 sleep_time = round(sleep_time * 1.1, 3) 553 continue 554 555 if stop_callback_function is not None: 556 try: 557 if asyncio.iscoroutinefunction(stop_callback_function): 558 await stop_callback_function(self.result) 559 else: 560 stop_callback_function(self.result) 561 except asyncio.exceptions.CancelledError: 562 break 563 except Exception: 564 warn(traceback.format_exc()) 565 566 if stop_on_exit: 567 events['stopped'].set() 568 569 break 570 await asyncio.sleep(0.1) 571 572 events['stopped_timeout'].set() 573 574 async def check_blocking_on_input(): 575 while True: 576 if not emitted_text or not self.is_blocking_on_stdin(): 577 try: 578 await asyncio.sleep(self.refresh_seconds) 579 except asyncio.exceptions.CancelledError: 580 break 581 continue 582 583 if not self.is_running(): 584 break 585 586 await emit_latest_lines() 587 588 try: 589 print('', end='', flush=True) 590 if asyncio.iscoroutinefunction(input_callback_function): 591 data = await input_callback_function() 592 else: 593 loop = asyncio.get_running_loop() 594 data = await loop.run_in_executor(None, input_callback_function) 595 except KeyboardInterrupt: 596 break 597 # if not data.endswith('\n'): 598 # data += '\n' 599 600 stdin_file.write(data) 601 await asyncio.sleep(self.refresh_seconds) 602 603 async def combine_events(): 604 event_tasks = [ 605 asyncio.create_task(event.wait()) 606 for event in events.values() 607 if event is not None 608 ] 609 if not event_tasks: 610 return 611 612 try: 613 done, pending = await asyncio.wait( 614 event_tasks, 615 return_when=asyncio.FIRST_COMPLETED, 616 ) 617 for task in pending: 618 task.cancel() 619 except asyncio.exceptions.CancelledError: 620 pass 621 finally: 622 combined_event.set() 623 624 check_job_status_task = asyncio.create_task(check_job_status()) 625 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 626 combine_events_task = asyncio.create_task(combine_events()) 627 628 log = _log if _log is not None else self.daemon.rotating_log 629 lines_to_show = ( 630 self.daemon.properties.get( 631 'logs', {} 632 ).get( 633 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 634 ) 635 ) 636 637 async def emit_latest_lines(): 638 nonlocal emitted_text 639 nonlocal stop_event 640 lines = log.readlines() 641 for line in lines[(-1 * lines_to_show):]: 642 if stop_event is not None and stop_event.is_set(): 643 return 644 645 line_stripped_extra = strip_timestamp_from_line(line.strip()) 646 line_stripped = strip_timestamp_from_line(line) 647 648 if line_stripped_extra == STOP_TOKEN: 649 events['stop_token'].set() 650 return 651 652 if line_stripped_extra == CLEAR_TOKEN: 653 clear_screen(debug=debug) 654 continue 655 656 if line_stripped_extra == FLUSH_TOKEN.strip(): 657 line_stripped = '' 658 line = '' 659 660 if strip_timestamps: 661 line = line_stripped 662 663 try: 664 if asyncio.iscoroutinefunction(callback_function): 665 await callback_function(line) 666 else: 667 callback_function(line) 668 emitted_text = True 669 except StopMonitoringLogs: 670 events['stop_exception'].set() 671 return 672 except Exception: 673 warn(f"Error in logs callback:\n{traceback.format_exc()}") 674 675 await emit_latest_lines() 676 677 tasks = ( 678 [check_job_status_task] 679 + ([check_blocking_on_input_task] if accept_input else []) 680 + [combine_events_task] 681 ) 682 try: 683 _ = asyncio.gather(*tasks, return_exceptions=True) 684 except asyncio.exceptions.CancelledError: 685 raise 686 except Exception: 687 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 688 689 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 690 dir_path_to_monitor = ( 691 _logs_path 692 or (log.file_path.parent if log else None) 693 or LOGS_RESOURCES_PATH 694 ) 695 async for changes in watchfiles.awatch( 696 dir_path_to_monitor, 697 stop_event=combined_event, 698 ): 699 for change in changes: 700 file_path_str = change[1] 701 file_path = pathlib.Path(file_path_str) 702 latest_subfile_path = log.get_latest_subfile_path() 703 if latest_subfile_path != file_path: 704 continue 705 706 await emit_latest_lines() 707 708 await emit_latest_lines() 709 710 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 711 """ 712 Return whether a job's daemon is blocking on stdin. 713 """ 714 if self.executor is not None: 715 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 716 717 return self.is_running() and self.daemon.blocking_stdin_file_path.exists() 718 719 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 720 """ 721 Return the kwargs to the blocking `prompt()`, if available. 722 """ 723 if self.executor is not None: 724 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 725 726 if not self.daemon.prompt_kwargs_file_path.exists(): 727 return {} 728 729 try: 730 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 731 prompt_kwargs = json.load(f) 732 733 return prompt_kwargs 734 735 except Exception: 736 import traceback 737 traceback.print_exc() 738 return {} 739 740 def write_stdin(self, data): 741 """ 742 Write to a job's daemon's `stdin`. 743 """ 744 self.daemon.stdin_file.write(data) 745 746 @property 747 def executor(self) -> Union[Executor, None]: 748 """ 749 If the job is remote, return the connector to the remote API instance. 750 """ 751 return ( 752 mrsm.get_connector(self.executor_keys) 753 if self.executor_keys != 'local' 754 else None 755 ) 756 757 @property 758 def status(self) -> str: 759 """ 760 Return the running status of the job's daemon. 761 """ 762 if '_status_hook' in self.__dict__: 763 return self._status_hook() 764 765 if self.executor is not None: 766 return self.executor.get_job_status(self.name) 767 768 return self.daemon.status 769 770 @property 771 def pid(self) -> Union[int, None]: 772 """ 773 Return the PID of the job's dameon. 774 """ 775 if self.executor is not None: 776 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 777 778 return self.daemon.pid 779 780 @property 781 def restart(self) -> bool: 782 """ 783 Return whether to restart a stopped job. 784 """ 785 if self.executor is not None: 786 return self.executor.get_job_metadata(self.name).get('restart', False) 787 788 return self.daemon.properties.get('restart', False) 789 790 @property 791 def result(self) -> SuccessTuple: 792 """ 793 Return the `SuccessTuple` when the job has terminated. 794 """ 795 if self.is_running(): 796 return True, f"{self} is running." 797 798 if '_result_hook' in self.__dict__: 799 return self._result_hook() 800 801 if self.executor is not None: 802 return ( 803 self.executor.get_job_metadata(self.name) 804 .get('result', (False, "No result available.")) 805 ) 806 807 _result = self.daemon.properties.get('result', None) 808 if _result is None: 809 from meerschaum.utils.daemon.Daemon import _results 810 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 811 812 return tuple(_result) 813 814 @property 815 def sysargs(self) -> List[str]: 816 """ 817 Return the sysargs to use for the Daemon. 818 """ 819 if self._sysargs: 820 return self._sysargs 821 822 if self.executor is not None: 823 return self.executor.get_job_metadata(self.name).get('sysargs', []) 824 825 target_args = self.daemon.target_args 826 if target_args is None: 827 return [] 828 self._sysargs = target_args[0] if len(target_args) > 0 else [] 829 return self._sysargs 830 831 def get_daemon_properties(self) -> Dict[str, Any]: 832 """ 833 Return the `properties` dictionary for the job's daemon. 834 """ 835 remote_properties = ( 836 {} 837 if self.executor is None 838 else self.executor.get_job_properties(self.name) 839 ) 840 return { 841 **remote_properties, 842 **self._properties_patch 843 } 844 845 @property 846 def daemon(self) -> 'Daemon': 847 """ 848 Return the daemon which this job manages. 849 """ 850 from meerschaum.utils.daemon import Daemon 851 if self._daemon is not None and self.executor is None and self._sysargs: 852 return self._daemon 853 854 self._daemon = Daemon( 855 target=entry, 856 target_args=[self._sysargs], 857 target_kw={}, 858 daemon_id=self.name, 859 label=shlex.join(self._sysargs), 860 properties=self.get_daemon_properties(), 861 ) 862 if '_rotating_log' in self.__dict__: 863 self._daemon._rotating_log = self._rotating_log 864 865 if '_stdin_file' in self.__dict__: 866 self._daemon._stdin_file = self._stdin_file 867 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 868 869 return self._daemon 870 871 @property 872 def began(self) -> Union[datetime, None]: 873 """ 874 The datetime when the job began running. 875 """ 876 if self.executor is not None: 877 began_str = self.executor.get_job_began(self.name) 878 if began_str is None: 879 return None 880 return ( 881 datetime.fromisoformat(began_str) 882 .astimezone(timezone.utc) 883 .replace(tzinfo=None) 884 ) 885 886 began_str = self.daemon.properties.get('process', {}).get('began', None) 887 if began_str is None: 888 return None 889 890 return datetime.fromisoformat(began_str) 891 892 @property 893 def ended(self) -> Union[datetime, None]: 894 """ 895 The datetime when the job stopped running. 896 """ 897 if self.executor is not None: 898 ended_str = self.executor.get_job_ended(self.name) 899 if ended_str is None: 900 return None 901 return ( 902 datetime.fromisoformat(ended_str) 903 .astimezone(timezone.utc) 904 .replace(tzinfo=None) 905 ) 906 907 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 908 if ended_str is None: 909 return None 910 911 return datetime.fromisoformat(ended_str) 912 913 @property 914 def paused(self) -> Union[datetime, None]: 915 """ 916 The datetime when the job was suspended while running. 917 """ 918 if self.executor is not None: 919 paused_str = self.executor.get_job_paused(self.name) 920 if paused_str is None: 921 return None 922 return ( 923 datetime.fromisoformat(paused_str) 924 .astimezone(timezone.utc) 925 .replace(tzinfo=None) 926 ) 927 928 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 929 if paused_str is None: 930 return None 931 932 return datetime.fromisoformat(paused_str) 933 934 @property 935 def stop_time(self) -> Union[datetime, None]: 936 """ 937 Return the timestamp when the job was manually stopped. 938 """ 939 if self.executor is not None: 940 return self.executor.get_job_stop_time(self.name) 941 942 if not self.daemon.stop_path.exists(): 943 return None 944 945 stop_data = self.daemon._read_stop_file() 946 if not stop_data: 947 return None 948 949 stop_time_str = stop_data.get('stop_time', None) 950 if not stop_time_str: 951 warn(f"Could not read stop time for {self}.") 952 return None 953 954 return datetime.fromisoformat(stop_time_str) 955 956 @property 957 def hidden(self) -> bool: 958 """ 959 Return a bool indicating whether this job should be displayed. 960 """ 961 return ( 962 self.name.startswith('_') 963 or self.name.startswith('.') 964 or self._is_externally_managed 965 ) 966 967 def check_restart(self) -> SuccessTuple: 968 """ 969 If `restart` is `True` and the daemon is not running, 970 restart the job. 971 Do not restart if the job was manually stopped. 972 """ 973 if self.is_running(): 974 return True, f"{self} is running." 975 976 if not self.restart: 977 return True, f"{self} does not need to be restarted." 978 979 if self.stop_time is not None: 980 return True, f"{self} was manually stopped." 981 982 return self.start() 983 984 @property 985 def label(self) -> str: 986 """ 987 Return the job's Daemon label (joined sysargs). 988 """ 989 from meerschaum._internal.arguments import compress_pipeline_sysargs 990 sysargs = compress_pipeline_sysargs(self.sysargs) 991 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip() 992 993 @property 994 def _externally_managed_file(self) -> pathlib.Path: 995 """ 996 Return the path to the externally managed file. 997 """ 998 return self.daemon.path / '.externally-managed' 999 1000 def _set_externally_managed(self): 1001 """ 1002 Set this job as externally managed. 1003 """ 1004 self._externally_managed = True 1005 try: 1006 self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True) 1007 self._externally_managed_file.touch() 1008 except Exception as e: 1009 warn(e) 1010 1011 @property 1012 def _is_externally_managed(self) -> bool: 1013 """ 1014 Return whether this job is externally managed. 1015 """ 1016 return self.executor_keys in (None, 'local') and ( 1017 self._externally_managed or self._externally_managed_file.exists() 1018 ) 1019 1020 @property 1021 def env(self) -> Dict[str, str]: 1022 """ 1023 Return the environment variables to set for the job's process. 1024 """ 1025 if '_env' in self.__dict__: 1026 return self.__dict__['_env'] 1027 1028 _env = self.daemon.properties.get('env', {}) 1029 default_env = { 1030 'PYTHONUNBUFFERED': '1', 1031 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1032 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1033 STATIC_CONFIG['environment']['noninteractive']: 'true', 1034 } 1035 self._env = {**default_env, **_env} 1036 return self._env 1037 1038 @property 1039 def delete_after_completion(self) -> bool: 1040 """ 1041 Return whether this job is configured to delete itself after completion. 1042 """ 1043 if '_delete_after_completion' in self.__dict__: 1044 return self.__dict__.get('_delete_after_completion', False) 1045 1046 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1047 return self._delete_after_completion 1048 1049 def __str__(self) -> str: 1050 sysargs = self.sysargs 1051 sysargs_str = shlex.join(sysargs) if sysargs else '' 1052 job_str = f'Job("{self.name}"' 1053 if sysargs_str: 1054 job_str += f', "{sysargs_str}"' 1055 1056 job_str += ')' 1057 return job_str 1058 1059 def __repr__(self) -> str: 1060 return str(self) 1061 1062 def __hash__(self) -> int: 1063 return hash(self.name)
Manage a meerschaum.utils.daemon.Daemon
, locally or remotely via the API.
75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break
Create a new job to manage a meerschaum.utils.daemon.Daemon
.
Parameters
- name (str): The name of the job to be created. This will also be used as the Daemon ID.
- sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
- env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
- executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
- delete_after_completion (bool, default False):
If
True
, delete this job when it has finished executing. - refresh_seconds (Union[int, float, None], default None):
The number of seconds to sleep between refreshes.
Defaults to the configured value
system.cli.refresh_seconds
. - _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 from meerschaum.config.paths import DAEMON_RESOURCES_PATH 217 218 psutil = mrsm.attempt_import('psutil') 219 try: 220 process = psutil.Process(pid) 221 except psutil.NoSuchProcess as e: 222 warn(f"Process with PID {pid} does not exist.", stack=False) 223 raise e 224 225 command_args = process.cmdline() 226 is_daemon = command_args[1] == '-c' 227 228 if is_daemon: 229 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 230 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 231 if root_dir is None: 232 from meerschaum.config.paths import ROOT_DIR_PATH 233 root_dir = ROOT_DIR_PATH 234 else: 235 root_dir = pathlib.Path(root_dir) 236 jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name 237 daemon_dir = jobs_dir / daemon_id 238 pid_file = daemon_dir / 'process.pid' 239 240 if pid_file.exists(): 241 with open(pid_file, 'r', encoding='utf-8') as f: 242 daemon_pid = int(f.read()) 243 244 if pid != daemon_pid: 245 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 246 else: 247 raise EnvironmentError(f"Is job '{daemon_id}' running?") 248 249 return Job(daemon_id, executor_keys=executor_keys) 250 251 from meerschaum._internal.arguments._parse_arguments import parse_arguments 252 from meerschaum.utils.daemon import get_new_daemon_name 253 254 mrsm_ix = 0 255 for i, arg in enumerate(command_args): 256 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 257 mrsm_ix = i 258 break 259 260 sysargs = command_args[mrsm_ix+1:] 261 kwargs = parse_arguments(sysargs) 262 name = kwargs.get('name', get_new_daemon_name()) 263 return Job(name, sysargs, executor_keys=executor_keys)
Build a Job
from the PID of a running Meerschaum process.
Parameters
- pid (int): The PID of the process.
- executor_keys (Optional[str], default None): The executor keys to assign to the job.
265 def start(self, debug: bool = False) -> SuccessTuple: 266 """ 267 Start the job's daemon. 268 """ 269 if self.executor is not None: 270 if not self.exists(debug=debug): 271 return self.executor.create_job( 272 self.name, 273 self.sysargs, 274 properties=self.daemon.properties, 275 debug=debug, 276 ) 277 return self.executor.start_job(self.name, debug=debug) 278 279 if self.is_running(): 280 return True, f"{self} is already running." 281 282 success, msg = self.daemon.run( 283 keep_daemon_output=(not self.delete_after_completion), 284 allow_dirty_run=True, 285 ) 286 if not success: 287 return success, msg 288 289 return success, f"Started {self}."
Start the job's daemon.
291 def stop( 292 self, 293 timeout_seconds: Union[int, float, None] = None, 294 debug: bool = False, 295 ) -> SuccessTuple: 296 """ 297 Stop the job's daemon. 298 """ 299 if self.executor is not None: 300 return self.executor.stop_job(self.name, debug=debug) 301 302 if self.daemon.status == 'stopped': 303 if not self.restart: 304 return True, f"{self} is not running." 305 elif self.stop_time is not None: 306 return True, f"{self} will not restart until manually started." 307 308 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 309 if quit_success: 310 return quit_success, f"Stopped {self}." 311 312 warn( 313 f"Failed to gracefully quit {self}.", 314 stack=False, 315 ) 316 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 317 if not kill_success: 318 return kill_success, kill_msg 319 320 return kill_success, f"Killed {self}."
Stop the job's daemon.
322 def pause( 323 self, 324 timeout_seconds: Union[int, float, None] = None, 325 debug: bool = False, 326 ) -> SuccessTuple: 327 """ 328 Pause the job's daemon. 329 """ 330 if self.executor is not None: 331 return self.executor.pause_job(self.name, debug=debug) 332 333 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 334 if not pause_success: 335 return pause_success, pause_msg 336 337 return pause_success, f"Paused {self}."
Pause the job's daemon.
339 def delete(self, debug: bool = False) -> SuccessTuple: 340 """ 341 Delete the job and its daemon. 342 """ 343 if self.executor is not None: 344 return self.executor.delete_job(self.name, debug=debug) 345 346 if self.is_running(): 347 stop_success, stop_msg = self.stop() 348 if not stop_success: 349 return stop_success, stop_msg 350 351 cleanup_success, cleanup_msg = self.daemon.cleanup() 352 if not cleanup_success: 353 return cleanup_success, cleanup_msg 354 355 _ = self.daemon._properties.pop('result', None) 356 return cleanup_success, f"Deleted {self}."
Delete the job and its daemon.
358 def is_running(self) -> bool: 359 """ 360 Determine whether the job's daemon is running. 361 """ 362 return self.status == 'running'
Determine whether the job's daemon is running.
364 def exists(self, debug: bool = False) -> bool: 365 """ 366 Determine whether the job exists. 367 """ 368 if self.executor is not None: 369 return self.executor.get_job_exists(self.name, debug=debug) 370 371 return self.daemon.path.exists()
Determine whether the job exists.
373 def get_logs(self) -> Union[str, None]: 374 """ 375 Return the output text of the job's daemon. 376 """ 377 if self.executor is not None: 378 return self.executor.get_logs(self.name) 379 380 return self.daemon.log_text
Return the output text of the job's daemon.
382 def monitor_logs( 383 self, 384 callback_function: Callable[[str], None] = _default_stdout_callback, 385 input_callback_function: Optional[Callable[[], str]] = None, 386 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 387 stop_event: Optional[asyncio.Event] = None, 388 stop_on_exit: bool = False, 389 strip_timestamps: bool = False, 390 accept_input: bool = True, 391 debug: bool = False, 392 _logs_path: Optional[pathlib.Path] = None, 393 _log=None, 394 _stdin_file=None, 395 _wait_if_stopped: bool = True, 396 ): 397 """ 398 Monitor the job's log files and execute a callback on new lines. 399 400 Parameters 401 ---------- 402 callback_function: Callable[[str], None], default partial(print, end='') 403 The callback to execute as new data comes in. 404 Defaults to printing the output directly to `stdout`. 405 406 input_callback_function: Optional[Callable[[], str]], default None 407 If provided, execute this callback when the daemon is blocking on stdin. 408 Defaults to `sys.stdin.readline()`. 409 410 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 411 If provided, execute this callback when the daemon stops. 412 The job's SuccessTuple will be passed to the callback. 413 414 stop_event: Optional[asyncio.Event], default None 415 If provided, stop monitoring when this event is set. 416 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 417 from within `callback_function` to stop monitoring. 418 419 stop_on_exit: bool, default False 420 If `True`, stop monitoring when the job stops. 421 422 strip_timestamps: bool, default False 423 If `True`, remove leading timestamps from lines. 424 425 accept_input: bool, default True 426 If `True`, accept input when the daemon blocks on stdin. 427 """ 428 if self.executor is not None: 429 self.executor.monitor_logs( 430 self.name, 431 callback_function, 432 input_callback_function=input_callback_function, 433 stop_callback_function=stop_callback_function, 434 stop_on_exit=stop_on_exit, 435 accept_input=accept_input, 436 strip_timestamps=strip_timestamps, 437 debug=debug, 438 ) 439 return 440 441 monitor_logs_coroutine = self.monitor_logs_async( 442 callback_function=callback_function, 443 input_callback_function=input_callback_function, 444 stop_callback_function=stop_callback_function, 445 stop_event=stop_event, 446 stop_on_exit=stop_on_exit, 447 strip_timestamps=strip_timestamps, 448 accept_input=accept_input, 449 debug=debug, 450 _logs_path=_logs_path, 451 _log=_log, 452 _stdin_file=_stdin_file, 453 _wait_if_stopped=_wait_if_stopped, 454 ) 455 return asyncio.run(monitor_logs_coroutine)
Monitor the job's log files and execute a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default partial(print, end='')):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout
. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline()
. - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogs
from withincallback_function
to stop monitoring. - stop_on_exit (bool, default False):
If
True
, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True
, remove leading timestamps from lines. - accept_input (bool, default True):
If
True
, accept input when the daemon blocks on stdin.
457 async def monitor_logs_async( 458 self, 459 callback_function: Callable[[str], None] = _default_stdout_callback, 460 input_callback_function: Optional[Callable[[], str]] = None, 461 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 462 stop_event: Optional[asyncio.Event] = None, 463 stop_on_exit: bool = False, 464 strip_timestamps: bool = False, 465 accept_input: bool = True, 466 debug: bool = False, 467 _logs_path: Optional[pathlib.Path] = None, 468 _log=None, 469 _stdin_file=None, 470 _wait_if_stopped: bool = True, 471 ): 472 """ 473 Monitor the job's log files and await a callback on new lines. 474 475 Parameters 476 ---------- 477 callback_function: Callable[[str], None], default _default_stdout_callback 478 The callback to execute as new data comes in. 479 Defaults to printing the output directly to `stdout`. 480 481 input_callback_function: Optional[Callable[[], str]], default None 482 If provided, execute this callback when the daemon is blocking on stdin. 483 Defaults to `sys.stdin.readline()`. 484 485 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 486 If provided, execute this callback when the daemon stops. 487 The job's SuccessTuple will be passed to the callback. 488 489 stop_event: Optional[asyncio.Event], default None 490 If provided, stop monitoring when this event is set. 491 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 492 from within `callback_function` to stop monitoring. 493 494 stop_on_exit: bool, default False 495 If `True`, stop monitoring when the job stops. 496 497 strip_timestamps: bool, default False 498 If `True`, remove leading timestamps from lines. 499 500 accept_input: bool, default True 501 If `True`, accept input when the daemon blocks on stdin. 502 """ 503 from meerschaum.utils.prompt import prompt 504 505 def default_input_callback_function(): 506 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 507 if prompt_kwargs: 508 answer = prompt(**prompt_kwargs) 509 return answer + '\n' 510 return sys.stdin.readline() 511 512 if input_callback_function is None: 513 input_callback_function = default_input_callback_function 514 515 if self.executor is not None: 516 await self.executor.monitor_logs_async( 517 self.name, 518 callback_function, 519 input_callback_function=input_callback_function, 520 stop_callback_function=stop_callback_function, 521 stop_on_exit=stop_on_exit, 522 strip_timestamps=strip_timestamps, 523 accept_input=accept_input, 524 debug=debug, 525 ) 526 return 527 528 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 529 530 events = { 531 'user': stop_event, 532 'stopped': asyncio.Event(), 533 'stop_token': asyncio.Event(), 534 'stop_exception': asyncio.Event(), 535 'stopped_timeout': asyncio.Event(), 536 } 537 combined_event = asyncio.Event() 538 emitted_text = False 539 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 540 541 async def check_job_status(): 542 if not stop_on_exit: 543 return 544 545 nonlocal emitted_text 546 547 sleep_time = 0.1 548 while sleep_time < 0.2: 549 if self.status == 'stopped': 550 if not emitted_text and _wait_if_stopped: 551 await asyncio.sleep(sleep_time) 552 sleep_time = round(sleep_time * 1.1, 3) 553 continue 554 555 if stop_callback_function is not None: 556 try: 557 if asyncio.iscoroutinefunction(stop_callback_function): 558 await stop_callback_function(self.result) 559 else: 560 stop_callback_function(self.result) 561 except asyncio.exceptions.CancelledError: 562 break 563 except Exception: 564 warn(traceback.format_exc()) 565 566 if stop_on_exit: 567 events['stopped'].set() 568 569 break 570 await asyncio.sleep(0.1) 571 572 events['stopped_timeout'].set() 573 574 async def check_blocking_on_input(): 575 while True: 576 if not emitted_text or not self.is_blocking_on_stdin(): 577 try: 578 await asyncio.sleep(self.refresh_seconds) 579 except asyncio.exceptions.CancelledError: 580 break 581 continue 582 583 if not self.is_running(): 584 break 585 586 await emit_latest_lines() 587 588 try: 589 print('', end='', flush=True) 590 if asyncio.iscoroutinefunction(input_callback_function): 591 data = await input_callback_function() 592 else: 593 loop = asyncio.get_running_loop() 594 data = await loop.run_in_executor(None, input_callback_function) 595 except KeyboardInterrupt: 596 break 597 # if not data.endswith('\n'): 598 # data += '\n' 599 600 stdin_file.write(data) 601 await asyncio.sleep(self.refresh_seconds) 602 603 async def combine_events(): 604 event_tasks = [ 605 asyncio.create_task(event.wait()) 606 for event in events.values() 607 if event is not None 608 ] 609 if not event_tasks: 610 return 611 612 try: 613 done, pending = await asyncio.wait( 614 event_tasks, 615 return_when=asyncio.FIRST_COMPLETED, 616 ) 617 for task in pending: 618 task.cancel() 619 except asyncio.exceptions.CancelledError: 620 pass 621 finally: 622 combined_event.set() 623 624 check_job_status_task = asyncio.create_task(check_job_status()) 625 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 626 combine_events_task = asyncio.create_task(combine_events()) 627 628 log = _log if _log is not None else self.daemon.rotating_log 629 lines_to_show = ( 630 self.daemon.properties.get( 631 'logs', {} 632 ).get( 633 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 634 ) 635 ) 636 637 async def emit_latest_lines(): 638 nonlocal emitted_text 639 nonlocal stop_event 640 lines = log.readlines() 641 for line in lines[(-1 * lines_to_show):]: 642 if stop_event is not None and stop_event.is_set(): 643 return 644 645 line_stripped_extra = strip_timestamp_from_line(line.strip()) 646 line_stripped = strip_timestamp_from_line(line) 647 648 if line_stripped_extra == STOP_TOKEN: 649 events['stop_token'].set() 650 return 651 652 if line_stripped_extra == CLEAR_TOKEN: 653 clear_screen(debug=debug) 654 continue 655 656 if line_stripped_extra == FLUSH_TOKEN.strip(): 657 line_stripped = '' 658 line = '' 659 660 if strip_timestamps: 661 line = line_stripped 662 663 try: 664 if asyncio.iscoroutinefunction(callback_function): 665 await callback_function(line) 666 else: 667 callback_function(line) 668 emitted_text = True 669 except StopMonitoringLogs: 670 events['stop_exception'].set() 671 return 672 except Exception: 673 warn(f"Error in logs callback:\n{traceback.format_exc()}") 674 675 await emit_latest_lines() 676 677 tasks = ( 678 [check_job_status_task] 679 + ([check_blocking_on_input_task] if accept_input else []) 680 + [combine_events_task] 681 ) 682 try: 683 _ = asyncio.gather(*tasks, return_exceptions=True) 684 except asyncio.exceptions.CancelledError: 685 raise 686 except Exception: 687 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 688 689 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 690 dir_path_to_monitor = ( 691 _logs_path 692 or (log.file_path.parent if log else None) 693 or LOGS_RESOURCES_PATH 694 ) 695 async for changes in watchfiles.awatch( 696 dir_path_to_monitor, 697 stop_event=combined_event, 698 ): 699 for change in changes: 700 file_path_str = change[1] 701 file_path = pathlib.Path(file_path_str) 702 latest_subfile_path = log.get_latest_subfile_path() 703 if latest_subfile_path != file_path: 704 continue 705 706 await emit_latest_lines() 707 708 await emit_latest_lines()
Monitor the job's log files and await a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default _default_stdout_callback):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout
. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline()
. - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogs
from withincallback_function
to stop monitoring. - stop_on_exit (bool, default False):
If
True
, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True
, remove leading timestamps from lines. - accept_input (bool, default True):
If
True
, accept input when the daemon blocks on stdin.
710 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 711 """ 712 Return whether a job's daemon is blocking on stdin. 713 """ 714 if self.executor is not None: 715 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 716 717 return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
Return whether a job's daemon is blocking on stdin.
719 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 720 """ 721 Return the kwargs to the blocking `prompt()`, if available. 722 """ 723 if self.executor is not None: 724 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 725 726 if not self.daemon.prompt_kwargs_file_path.exists(): 727 return {} 728 729 try: 730 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 731 prompt_kwargs = json.load(f) 732 733 return prompt_kwargs 734 735 except Exception: 736 import traceback 737 traceback.print_exc() 738 return {}
Return the kwargs to the blocking prompt()
, if available.
740 def write_stdin(self, data): 741 """ 742 Write to a job's daemon's `stdin`. 743 """ 744 self.daemon.stdin_file.write(data)
Write to a job's daemon's stdin
.
746 @property 747 def executor(self) -> Union[Executor, None]: 748 """ 749 If the job is remote, return the connector to the remote API instance. 750 """ 751 return ( 752 mrsm.get_connector(self.executor_keys) 753 if self.executor_keys != 'local' 754 else None 755 )
If the job is remote, return the connector to the remote API instance.
757 @property 758 def status(self) -> str: 759 """ 760 Return the running status of the job's daemon. 761 """ 762 if '_status_hook' in self.__dict__: 763 return self._status_hook() 764 765 if self.executor is not None: 766 return self.executor.get_job_status(self.name) 767 768 return self.daemon.status
Return the running status of the job's daemon.
770 @property 771 def pid(self) -> Union[int, None]: 772 """ 773 Return the PID of the job's dameon. 774 """ 775 if self.executor is not None: 776 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 777 778 return self.daemon.pid
Return the PID of the job's dameon.
780 @property 781 def restart(self) -> bool: 782 """ 783 Return whether to restart a stopped job. 784 """ 785 if self.executor is not None: 786 return self.executor.get_job_metadata(self.name).get('restart', False) 787 788 return self.daemon.properties.get('restart', False)
Return whether to restart a stopped job.
790 @property 791 def result(self) -> SuccessTuple: 792 """ 793 Return the `SuccessTuple` when the job has terminated. 794 """ 795 if self.is_running(): 796 return True, f"{self} is running." 797 798 if '_result_hook' in self.__dict__: 799 return self._result_hook() 800 801 if self.executor is not None: 802 return ( 803 self.executor.get_job_metadata(self.name) 804 .get('result', (False, "No result available.")) 805 ) 806 807 _result = self.daemon.properties.get('result', None) 808 if _result is None: 809 from meerschaum.utils.daemon.Daemon import _results 810 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 811 812 return tuple(_result)
Return the SuccessTuple
when the job has terminated.
814 @property 815 def sysargs(self) -> List[str]: 816 """ 817 Return the sysargs to use for the Daemon. 818 """ 819 if self._sysargs: 820 return self._sysargs 821 822 if self.executor is not None: 823 return self.executor.get_job_metadata(self.name).get('sysargs', []) 824 825 target_args = self.daemon.target_args 826 if target_args is None: 827 return [] 828 self._sysargs = target_args[0] if len(target_args) > 0 else [] 829 return self._sysargs
Return the sysargs to use for the Daemon.
831 def get_daemon_properties(self) -> Dict[str, Any]: 832 """ 833 Return the `properties` dictionary for the job's daemon. 834 """ 835 remote_properties = ( 836 {} 837 if self.executor is None 838 else self.executor.get_job_properties(self.name) 839 ) 840 return { 841 **remote_properties, 842 **self._properties_patch 843 }
Return the properties
dictionary for the job's daemon.
845 @property 846 def daemon(self) -> 'Daemon': 847 """ 848 Return the daemon which this job manages. 849 """ 850 from meerschaum.utils.daemon import Daemon 851 if self._daemon is not None and self.executor is None and self._sysargs: 852 return self._daemon 853 854 self._daemon = Daemon( 855 target=entry, 856 target_args=[self._sysargs], 857 target_kw={}, 858 daemon_id=self.name, 859 label=shlex.join(self._sysargs), 860 properties=self.get_daemon_properties(), 861 ) 862 if '_rotating_log' in self.__dict__: 863 self._daemon._rotating_log = self._rotating_log 864 865 if '_stdin_file' in self.__dict__: 866 self._daemon._stdin_file = self._stdin_file 867 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 868 869 return self._daemon
Return the daemon which this job manages.
871 @property 872 def began(self) -> Union[datetime, None]: 873 """ 874 The datetime when the job began running. 875 """ 876 if self.executor is not None: 877 began_str = self.executor.get_job_began(self.name) 878 if began_str is None: 879 return None 880 return ( 881 datetime.fromisoformat(began_str) 882 .astimezone(timezone.utc) 883 .replace(tzinfo=None) 884 ) 885 886 began_str = self.daemon.properties.get('process', {}).get('began', None) 887 if began_str is None: 888 return None 889 890 return datetime.fromisoformat(began_str)
The datetime when the job began running.
892 @property 893 def ended(self) -> Union[datetime, None]: 894 """ 895 The datetime when the job stopped running. 896 """ 897 if self.executor is not None: 898 ended_str = self.executor.get_job_ended(self.name) 899 if ended_str is None: 900 return None 901 return ( 902 datetime.fromisoformat(ended_str) 903 .astimezone(timezone.utc) 904 .replace(tzinfo=None) 905 ) 906 907 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 908 if ended_str is None: 909 return None 910 911 return datetime.fromisoformat(ended_str)
The datetime when the job stopped running.
913 @property 914 def paused(self) -> Union[datetime, None]: 915 """ 916 The datetime when the job was suspended while running. 917 """ 918 if self.executor is not None: 919 paused_str = self.executor.get_job_paused(self.name) 920 if paused_str is None: 921 return None 922 return ( 923 datetime.fromisoformat(paused_str) 924 .astimezone(timezone.utc) 925 .replace(tzinfo=None) 926 ) 927 928 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 929 if paused_str is None: 930 return None 931 932 return datetime.fromisoformat(paused_str)
The datetime when the job was suspended while running.
934 @property 935 def stop_time(self) -> Union[datetime, None]: 936 """ 937 Return the timestamp when the job was manually stopped. 938 """ 939 if self.executor is not None: 940 return self.executor.get_job_stop_time(self.name) 941 942 if not self.daemon.stop_path.exists(): 943 return None 944 945 stop_data = self.daemon._read_stop_file() 946 if not stop_data: 947 return None 948 949 stop_time_str = stop_data.get('stop_time', None) 950 if not stop_time_str: 951 warn(f"Could not read stop time for {self}.") 952 return None 953 954 return datetime.fromisoformat(stop_time_str)
Return the timestamp when the job was manually stopped.
967 def check_restart(self) -> SuccessTuple: 968 """ 969 If `restart` is `True` and the daemon is not running, 970 restart the job. 971 Do not restart if the job was manually stopped. 972 """ 973 if self.is_running(): 974 return True, f"{self} is running." 975 976 if not self.restart: 977 return True, f"{self} does not need to be restarted." 978 979 if self.stop_time is not None: 980 return True, f"{self} was manually stopped." 981 982 return self.start()
If restart
is True
and the daemon is not running,
restart the job.
Do not restart if the job was manually stopped.
984 @property 985 def label(self) -> str: 986 """ 987 Return the job's Daemon label (joined sysargs). 988 """ 989 from meerschaum._internal.arguments import compress_pipeline_sysargs 990 sysargs = compress_pipeline_sysargs(self.sysargs) 991 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
Return the job's Daemon label (joined sysargs).
1020 @property 1021 def env(self) -> Dict[str, str]: 1022 """ 1023 Return the environment variables to set for the job's process. 1024 """ 1025 if '_env' in self.__dict__: 1026 return self.__dict__['_env'] 1027 1028 _env = self.daemon.properties.get('env', {}) 1029 default_env = { 1030 'PYTHONUNBUFFERED': '1', 1031 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1032 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1033 STATIC_CONFIG['environment']['noninteractive']: 'true', 1034 } 1035 self._env = {**default_env, **_env} 1036 return self._env
Return the environment variables to set for the job's process.
1038 @property 1039 def delete_after_completion(self) -> bool: 1040 """ 1041 Return whether this job is configured to delete itself after completion. 1042 """ 1043 if '_delete_after_completion' in self.__dict__: 1044 return self.__dict__.get('_delete_after_completion', False) 1045 1046 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1047 return self._delete_after_completion
Return whether this job is configured to delete itself after completion.
10def pprint( 11 *args, 12 detect_password: bool = True, 13 nopretty: bool = False, 14 **kw 15) -> None: 16 """Pretty print an object according to the configured ANSI and UNICODE settings. 17 If detect_password is True (default), search and replace passwords with '*' characters. 18 Does not mutate objects. 19 """ 20 import copy 21 import json 22 from meerschaum.utils.packages import attempt_import, import_rich 23 from meerschaum.utils.formatting import ANSI, get_console, print_tuple 24 from meerschaum.utils.warnings import error 25 from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords 26 from collections import OrderedDict 27 28 if ( 29 len(args) == 1 30 and 31 isinstance(args[0], tuple) 32 and 33 len(args[0]) == 2 34 and 35 isinstance(args[0][0], bool) 36 and 37 isinstance(args[0][1], str) 38 ): 39 return print_tuple(args[0], **filter_keywords(print_tuple, **kw)) 40 41 modify = True 42 rich_pprint = None 43 if ANSI and not nopretty: 44 rich = import_rich() 45 if rich is not None: 46 rich_pretty = attempt_import('rich.pretty') 47 if rich_pretty is not None: 48 def _rich_pprint(*args, **kw): 49 _console = get_console() 50 _kw = filter_keywords(_console.print, **kw) 51 _console.print(*args, **_kw) 52 rich_pprint = _rich_pprint 53 elif not nopretty: 54 pprintpp = attempt_import('pprintpp', warn=False) 55 try: 56 _pprint = pprintpp.pprint 57 except Exception : 58 import pprint as _pprint_module 59 _pprint = _pprint_module.pprint 60 61 func = ( 62 _pprint if rich_pprint is None else rich_pprint 63 ) if not nopretty else print 64 65 try: 66 args_copy = copy.deepcopy(args) 67 except Exception: 68 args_copy = args 69 modify = False 70 71 _args = [] 72 for a in args: 73 c = a 74 ### convert OrderedDict into dict 75 if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict): 76 c = dict_from_od(copy.deepcopy(c)) 77 _args.append(c) 78 args = _args 79 80 _args = list(args) 81 if detect_password and modify: 82 _args = [] 83 for a in args: 84 c = a 85 if isinstance(c, dict): 86 c = replace_password(copy.deepcopy(c)) 87 if nopretty: 88 try: 89 c = json.dumps(c) 90 is_json = True 91 except Exception: 92 is_json = False 93 if not is_json: 94 try: 95 c = str(c) 96 except Exception: 97 pass 98 _args.append(c) 99 100 ### filter out unsupported keywords 101 func_kw = filter_keywords(func, **kw) if not nopretty else {} 102 error_msg = None 103 try: 104 func(*_args, **func_kw) 105 except Exception as e: 106 error_msg = e 107 if error_msg is not None: 108 error(error_msg)
Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.
1237def attempt_import( 1238 *names: str, 1239 lazy: bool = True, 1240 warn: bool = True, 1241 install: bool = True, 1242 venv: Optional[str] = 'mrsm', 1243 precheck: bool = True, 1244 split: bool = True, 1245 check_update: bool = False, 1246 check_pypi: bool = False, 1247 check_is_installed: bool = True, 1248 allow_outside_venv: bool = True, 1249 color: bool = True, 1250 debug: bool = False 1251) -> Any: 1252 """ 1253 Raise a warning if packages are not installed; otherwise import and return modules. 1254 If `lazy` is `True`, return lazy-imported modules. 1255 1256 Returns tuple of modules if multiple names are provided, else returns one module. 1257 1258 Parameters 1259 ---------- 1260 names: List[str] 1261 The packages to be imported. 1262 1263 lazy: bool, default True 1264 If `True`, lazily load packages. 1265 1266 warn: bool, default True 1267 If `True`, raise a warning if a package cannot be imported. 1268 1269 install: bool, default True 1270 If `True`, attempt to install a missing package into the designated virtual environment. 1271 If `check_update` is True, install updates if available. 1272 1273 venv: Optional[str], default 'mrsm' 1274 The virtual environment in which to search for packages and to install packages into. 1275 1276 precheck: bool, default True 1277 If `True`, attempt to find module before importing (necessary for checking if modules exist 1278 and retaining lazy imports), otherwise assume lazy is `False`. 1279 1280 split: bool, default True 1281 If `True`, split packages' names on `'.'`. 1282 1283 check_update: bool, default False 1284 If `True` and `install` is `True`, install updates if the required minimum version 1285 does not match. 1286 1287 check_pypi: bool, default False 1288 If `True` and `check_update` is `True`, check PyPI when determining whether 1289 an update is required. 1290 1291 check_is_installed: bool, default True 1292 If `True`, check if the package is contained in the virtual environment. 1293 1294 allow_outside_venv: bool, default True 1295 If `True`, search outside of the specified virtual environment 1296 if the package cannot be found. 1297 Setting to `False` will reinstall the package into a virtual environment, even if it 1298 is installed outside. 1299 1300 color: bool, default True 1301 If `False`, do not print ANSI colors. 1302 1303 Returns 1304 ------- 1305 The specified modules. If they're not available and `install` is `True`, it will first 1306 download them into a virtual environment and return the modules. 1307 1308 Examples 1309 -------- 1310 >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy') 1311 >>> pandas = attempt_import('pandas') 1312 1313 """ 1314 1315 import importlib.util 1316 1317 ### to prevent recursion, check if parent Meerschaum package is being imported 1318 if names == ('meerschaum',): 1319 return _import_module('meerschaum') 1320 1321 if venv == 'mrsm' and _import_hook_venv is not None: 1322 if debug: 1323 print(f"Import hook for virtual environment '{_import_hook_venv}' is active.") 1324 venv = _import_hook_venv 1325 1326 _warnings = _import_module('meerschaum.utils.warnings') 1327 warn_function = _warnings.warn 1328 1329 def do_import(_name: str, **kw) -> Union['ModuleType', None]: 1330 with Venv(venv=venv, debug=debug): 1331 ### determine the import method (lazy vs normal) 1332 from meerschaum.utils.misc import filter_keywords 1333 import_method = ( 1334 _import_module if not lazy 1335 else lazy_import 1336 ) 1337 try: 1338 mod = import_method(_name, **(filter_keywords(import_method, **kw))) 1339 except Exception as e: 1340 if warn: 1341 import traceback 1342 traceback.print_exception(type(e), e, e.__traceback__) 1343 warn_function( 1344 f"Failed to import module '{_name}'.\nException:\n{e}", 1345 ImportWarning, 1346 stacklevel = (5 if lazy else 4), 1347 color = False, 1348 ) 1349 mod = None 1350 return mod 1351 1352 modules = [] 1353 for name in names: 1354 ### Check if package is a declared dependency. 1355 root_name = name.split('.')[0] if split else name 1356 install_name = _import_to_install_name(root_name) 1357 1358 if install_name is None: 1359 install_name = root_name 1360 if warn and root_name != 'plugins': 1361 warn_function( 1362 f"Package '{root_name}' is not declared in meerschaum.utils.packages.", 1363 ImportWarning, 1364 stacklevel = 3, 1365 color = False 1366 ) 1367 1368 ### Determine if the package exists. 1369 if precheck is False: 1370 found_module = ( 1371 do_import( 1372 name, debug=debug, warn=False, venv=venv, color=color, 1373 check_update=False, check_pypi=False, split=split, 1374 ) is not None 1375 ) 1376 else: 1377 if check_is_installed: 1378 with _locks['_is_installed_first_check']: 1379 if not _is_installed_first_check.get(name, False): 1380 package_is_installed = is_installed( 1381 name, 1382 venv = venv, 1383 split = split, 1384 allow_outside_venv = allow_outside_venv, 1385 debug = debug, 1386 ) 1387 _is_installed_first_check[name] = package_is_installed 1388 else: 1389 package_is_installed = _is_installed_first_check[name] 1390 else: 1391 package_is_installed = _is_installed_first_check.get( 1392 name, 1393 venv_contains_package(name, venv=venv, split=split, debug=debug) 1394 ) 1395 found_module = package_is_installed 1396 1397 if not found_module: 1398 if install: 1399 if not pip_install( 1400 install_name, 1401 venv = venv, 1402 split = False, 1403 check_update = check_update, 1404 color = color, 1405 debug = debug 1406 ) and warn: 1407 warn_function( 1408 f"Failed to install '{install_name}'.", 1409 ImportWarning, 1410 stacklevel = 3, 1411 color = False, 1412 ) 1413 elif warn: 1414 ### Raise a warning if we can't find the package and install = False. 1415 warn_function( 1416 (f"\n\nMissing package '{name}' from virtual environment '{venv}'; " 1417 + "some features will not work correctly." 1418 + "\n\nSet install=True when calling attempt_import.\n"), 1419 ImportWarning, 1420 stacklevel = 3, 1421 color = False, 1422 ) 1423 1424 ### Do the import. Will be lazy if lazy=True. 1425 m = do_import( 1426 name, debug=debug, warn=warn, venv=venv, color=color, 1427 check_update=check_update, check_pypi=check_pypi, install=install, split=split, 1428 ) 1429 modules.append(m) 1430 1431 modules = tuple(modules) 1432 if len(modules) == 1: 1433 return modules[0] 1434 return modules
Raise a warning if packages are not installed; otherwise import and return modules.
If lazy
is True
, return lazy-imported modules.
Returns tuple of modules if multiple names are provided, else returns one module.
Parameters
- names (List[str]): The packages to be imported.
- lazy (bool, default True):
If
True
, lazily load packages. - warn (bool, default True):
If
True
, raise a warning if a package cannot be imported. - install (bool, default True):
If
True
, attempt to install a missing package into the designated virtual environment. Ifcheck_update
is True, install updates if available. - venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
- precheck (bool, default True):
If
True
, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy isFalse
. - split (bool, default True):
If
True
, split packages' names on'.'
. - check_update (bool, default False):
If
True
andinstall
isTrue
, install updates if the required minimum version does not match. - check_pypi (bool, default False):
If
True
andcheck_update
isTrue
, check PyPI when determining whether an update is required. - check_is_installed (bool, default True):
If
True
, check if the package is contained in the virtual environment. - allow_outside_venv (bool, default True):
If
True
, search outside of the specified virtual environment if the package cannot be found. Setting toFalse
will reinstall the package into a virtual environment, even if it is installed outside. - color (bool, default True):
If
False
, do not print ANSI colors.
Returns
- The specified modules. If they're not available and
install
isTrue
, it will first - download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
22class Connector(metaclass=abc.ABCMeta): 23 """ 24 The base connector class to hold connection attributes. 25 """ 26 27 IS_INSTANCE: bool = False 28 29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 ) 69 70 def _reset_attributes(self): 71 self.__dict__ = self._original_dict 72 73 def _set_attributes( 74 self, 75 *args, 76 inherit_default: bool = True, 77 **kw: Any 78 ): 79 from meerschaum._internal.static import STATIC_CONFIG 80 from meerschaum.utils.warnings import error 81 82 self._attributes = {} 83 84 default_label = STATIC_CONFIG['connectors']['default_label'] 85 86 ### NOTE: Support the legacy method of explicitly passing the type. 87 label = kw.get('label', None) 88 if label is None: 89 if len(args) == 2: 90 label = args[1] 91 elif len(args) == 0: 92 label = None 93 else: 94 label = args[0] 95 96 if label == 'default': 97 error( 98 f"Label cannot be 'default'. Did you mean '{default_label}'?", 99 InvalidAttributesError, 100 ) 101 self.__dict__['label'] = label 102 103 from meerschaum.config import get_config 104 conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors')) 105 connector_config = copy.deepcopy(get_config('system', 'connectors')) 106 107 ### inherit attributes from 'default' if exists 108 if inherit_default: 109 inherit_from = 'default' 110 if self.type in conn_configs and inherit_from in conn_configs[self.type]: 111 _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from]) 112 self._attributes.update(_inherit_dict) 113 114 ### load user config into self._attributes 115 if self.type in conn_configs and self.label in conn_configs[self.type]: 116 self._attributes.update(conn_configs[self.type][self.label] or {}) 117 118 ### load system config into self._sys_config 119 ### (deep copy so future Connectors don't inherit changes) 120 if self.type in connector_config: 121 self._sys_config = copy.deepcopy(connector_config[self.type]) 122 123 ### add additional arguments or override configuration 124 self._attributes.update(kw) 125 126 ### finally, update __dict__ with _attributes. 127 self.__dict__.update(self._attributes) 128 129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 ) 175 176 177 def __str__(self): 178 """ 179 When cast to a string, return type:label. 180 """ 181 return f"{self.type}:{self.label}" 182 183 def __repr__(self): 184 """ 185 Represent the connector as type:label. 186 """ 187 return str(self) 188 189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta 204 205 206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type 225 226 227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
The base connector class to hold connection attributes.
29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 )
129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 )
Ensure that the required attributes have been met.
The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.
Parameters
- required_attributes (Optional[List[str]], default None):
Attributes to be verified. If
None
, default to['label']
. - debug (bool, default False): Verbosity toggle.
Returns
- Don't return anything.
Raises
- An error if any of the required attributes are missing.
189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta
Return the keys needed to reconstruct this Connector.
206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type
Return the type for this connector.
227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
Return the label for this connector.
18class InstanceConnector(Connector): 19 """ 20 Instance connectors define the interface for managing pipes and provide methods 21 for management of users, plugins, tokens, and other metadata built atop pipes. 22 """ 23 24 IS_INSTANCE: bool = True 25 IS_THREAD_SAFE: bool = False 26 27 from ._users import ( 28 get_users_pipe, 29 register_user, 30 get_user_id, 31 get_username, 32 get_users, 33 edit_user, 34 delete_user, 35 get_user_password_hash, 36 get_user_type, 37 get_user_attributes, 38 ) 39 40 from ._plugins import ( 41 get_plugins_pipe, 42 register_plugin, 43 get_plugin_user_id, 44 delete_plugin, 45 get_plugin_id, 46 get_plugin_version, 47 get_plugins, 48 get_plugin_user_id, 49 get_plugin_username, 50 get_plugin_attributes, 51 ) 52 53 from ._tokens import ( 54 get_tokens_pipe, 55 register_token, 56 edit_token, 57 invalidate_token, 58 delete_token, 59 get_token, 60 get_tokens, 61 get_token_model, 62 get_token_secret_hash, 63 token_exists, 64 get_token_scopes, 65 ) 66 67 from ._pipes import ( 68 register_pipe, 69 get_pipe_attributes, 70 get_pipe_id, 71 edit_pipe, 72 delete_pipe, 73 fetch_pipes_keys, 74 pipe_exists, 75 drop_pipe, 76 drop_pipe_indices, 77 sync_pipe, 78 create_pipe_indices, 79 clear_pipe, 80 get_pipe_data, 81 get_sync_time, 82 get_pipe_columns_types, 83 get_pipe_columns_indices, 84 )
Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.
18def get_users_pipe(self) -> 'mrsm.Pipe': 19 """ 20 Return the pipe used for users registration. 21 """ 22 if '_users_pipe' in self.__dict__: 23 return self._users_pipe 24 25 cache_connector = self.__dict__.get('_cache_connector', None) 26 self._users_pipe = mrsm.Pipe( 27 'mrsm', 'users', 28 instance=self, 29 target='mrsm_users', 30 temporary=True, 31 cache=True, 32 cache_connector_keys=cache_connector, 33 static=True, 34 null_indices=False, 35 columns={ 36 'primary': 'user_id', 37 }, 38 dtypes={ 39 'user_id': 'uuid', 40 'username': 'string', 41 'password_hash': 'string', 42 'email': 'string', 43 'user_type': 'string', 44 'attributes': 'json', 45 }, 46 indices={ 47 'unique': 'username', 48 }, 49 ) 50 return self._users_pipe
Return the pipe used for users registration.
53def register_user( 54 self, 55 user: User, 56 debug: bool = False, 57 **kwargs: Any 58) -> mrsm.SuccessTuple: 59 """ 60 Register a new user to the users pipe. 61 """ 62 users_pipe = self.get_users_pipe() 63 user.user_id = uuid.uuid4() 64 sync_success, sync_msg = users_pipe.sync( 65 [{ 66 'user_id': user.user_id, 67 'username': user.username, 68 'email': user.email, 69 'password_hash': user.password_hash, 70 'user_type': user.type, 71 'attributes': user.attributes, 72 }], 73 check_existing=False, 74 debug=debug, 75 ) 76 if not sync_success: 77 return False, f"Failed to register user '{user.username}':\n{sync_msg}" 78 79 return True, "Success"
Register a new user to the users pipe.
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 83 """ 84 Return a user's ID from the username. 85 """ 86 users_pipe = self.get_users_pipe() 87 result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1) 88 if result_df is None or len(result_df) == 0: 89 return None 90 return result_df['user_id'][0]
Return a user's ID from the username.
93def get_username(self, user_id: Any, debug: bool = False) -> Any: 94 """ 95 Return the username from the given ID. 96 """ 97 users_pipe = self.get_users_pipe() 98 return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)
Return the username from the given ID.
101def get_users( 102 self, 103 debug: bool = False, 104 **kw: Any 105) -> List[str]: 106 """ 107 Get the registered usernames. 108 """ 109 users_pipe = self.get_users_pipe() 110 df = users_pipe.get_data() 111 if df is None: 112 return [] 113 114 return list(df['username'])
Get the registered usernames.
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 118 """ 119 Edit the attributes for an existing user. 120 """ 121 users_pipe = self.get_users_pipe() 122 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 123 124 doc = {'user_id': user_id} 125 if user.email != '': 126 doc['email'] = user.email 127 if user.password_hash != '': 128 doc['password_hash'] = user.password_hash 129 if user.type != '': 130 doc['user_type'] = user.type 131 if user.attributes: 132 doc['attributes'] = user.attributes 133 134 sync_success, sync_msg = users_pipe.sync([doc], debug=debug) 135 if not sync_success: 136 return False, f"Failed to edit user '{user.username}':\n{sync_msg}" 137 138 return True, "Success"
Edit the attributes for an existing user.
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 142 """ 143 Delete a user from the users table. 144 """ 145 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 146 users_pipe = self.get_users_pipe() 147 clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug) 148 if not clear_success: 149 return False, f"Failed to delete user '{user}':\n{clear_msg}" 150 return True, "Success"
Delete a user from the users table.
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 154 """ 155 Get a user's password hash from the users table. 156 """ 157 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 158 users_pipe = self.get_users_pipe() 159 result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug) 160 if result_df is None or len(result_df) == 0: 161 return None 162 163 return result_df['password_hash'][0]
Get a user's password hash from the users table.
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]: 167 """ 168 Get a user's type from the users table. 169 """ 170 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 171 users_pipe = self.get_users_pipe() 172 result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug) 173 if result_df is None or len(result_df) == 0: 174 return None 175 176 return result_df['user_type'][0]
Get a user's type from the users table.
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]: 180 """ 181 Get a user's attributes from the users table. 182 """ 183 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 184 users_pipe = self.get_users_pipe() 185 result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug) 186 if result_df is None or len(result_df) == 0: 187 return None 188 189 return result_df['attributes'][0]
Get a user's attributes from the users table.
16def get_plugins_pipe(self) -> 'mrsm.Pipe': 17 """ 18 Return the internal pipe for syncing plugins metadata. 19 """ 20 if '_plugins_pipe' in self.__dict__: 21 return self._plugins_pipe 22 23 cache_connector = self.__dict__.get('_cache_connector', None) 24 users_pipe = self.get_users_pipe() 25 user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid') 26 27 self._plugins_pipe = mrsm.Pipe( 28 'mrsm', 'plugins', 29 instance=self, 30 target='mrsm_plugins', 31 temporary=True, 32 cache=True, 33 cache_connector_keys=cache_connector, 34 static=True, 35 null_indices=False, 36 columns={ 37 'primary': 'plugin_name', 38 'user_id': 'user_id', 39 }, 40 dtypes={ 41 'plugin_name': 'string', 42 'user_id': user_id_dtype, 43 'attributes': 'json', 44 'version': 'string', 45 }, 46 ) 47 return self._plugins_pipe
Return the internal pipe for syncing plugins metadata.
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 51 """ 52 Register a new plugin to the plugins table. 53 """ 54 plugins_pipe = self.get_plugins_pipe() 55 users_pipe = self.get_users_pipe() 56 user_id = self.get_plugin_user_id(plugin) 57 if user_id is not None: 58 username = self.get_username(user_id, debug=debug) 59 return False, f"{plugin} is already registered to '{username}'." 60 61 doc = { 62 'plugin_name': plugin.name, 63 'version': plugin.version, 64 'attributes': plugin.attributes, 65 'user_id': plugin.user_id, 66 } 67 68 sync_success, sync_msg = plugins_pipe.sync( 69 [doc], 70 check_existing=False, 71 debug=debug, 72 ) 73 if not sync_success: 74 return False, f"Failed to register {plugin}:\n{sync_msg}" 75 76 return True, "Success"
Register a new plugin to the plugins table.
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 80 """ 81 Return the user ID for plugin's owner. 82 """ 83 plugins_pipe = self.get_plugins_pipe() 84 return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)
Return the user ID for plugin's owner.
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 106 """ 107 Delete a plugin's registration. 108 """ 109 plugin_id = self.get_plugin_id(plugin, debug=debug) 110 if plugin_id is None: 111 return False, f"{plugin} is not registered." 112 113 plugins_pipe = self.get_plugins_pipe() 114 clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug) 115 if not clear_success: 116 return False, f"Failed to delete {plugin}:\n{clear_msg}" 117 return True, "Success"
Delete a plugin's registration.
97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 98 """ 99 Return a plugin's ID. 100 """ 101 user_id = self.get_plugin_user_id(plugin, debug=debug) 102 return plugin.name if user_id is not None else None
Return a plugin's ID.
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 121 """ 122 Return the version for a plugin. 123 """ 124 plugins_pipe = self.get_plugins_pipe() 125 return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)
Return the version for a plugin.
136def get_plugins( 137 self, 138 user_id: Optional[int] = None, 139 search_term: Optional[str] = None, 140 debug: bool = False, 141 **kw: Any 142) -> List[str]: 143 """ 144 Return a list of plugin names. 145 """ 146 plugins_pipe = self.get_plugins_pipe() 147 params = {} 148 if user_id: 149 params['user_id'] = user_id 150 151 df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug) 152 if df is None: 153 return [] 154 155 docs = df.to_dict(orient='records') 156 return [ 157 plugin_name 158 for doc in docs 159 if (plugin_name := doc['plugin_name']).startswith(search_term or '') 160 ]
Return a list of plugin names.
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 88 """ 89 Return the username for plugin's owner. 90 """ 91 user_id = self.get_plugin_user_id(plugin, debug=debug) 92 if user_id is None: 93 return None 94 return self.get_username(user_id, debug=debug)
Return the username for plugin's owner.
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]: 129 """ 130 Return the attributes for a plugin. 131 """ 132 plugins_pipe = self.get_plugins_pipe() 133 return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}
Return the attributes for a plugin.
22def get_tokens_pipe(self) -> mrsm.Pipe: 23 """ 24 Return the internal pipe for tokens management. 25 """ 26 if '_tokens_pipe' in self.__dict__: 27 return self._tokens_pipe 28 29 users_pipe = self.get_users_pipe() 30 user_id_dtype = ( 31 users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid') 32 ) 33 34 cache_connector = self.__dict__.get('_cache_connector', None) 35 36 self._tokens_pipe = mrsm.Pipe( 37 'mrsm', 'tokens', 38 instance=self, 39 target='mrsm_tokens', 40 temporary=True, 41 cache=True, 42 cache_connector_keys=cache_connector, 43 static=True, 44 autotime=True, 45 null_indices=False, 46 columns={ 47 'datetime': 'creation', 48 'primary': 'id', 49 }, 50 indices={ 51 'unique': 'label', 52 'user_id': 'user_id', 53 }, 54 dtypes={ 55 'id': 'uuid', 56 'creation': 'datetime', 57 'expiration': 'datetime', 58 'is_valid': 'bool', 59 'label': 'string', 60 'user_id': user_id_dtype, 61 'scopes': 'json', 62 'secret_hash': 'string', 63 }, 64 ) 65 return self._tokens_pipe
Return the internal pipe for tokens management.
68def register_token( 69 self, 70 token: Token, 71 debug: bool = False, 72) -> mrsm.SuccessTuple: 73 """ 74 Register the new token to the tokens table. 75 """ 76 token_id, token_secret = token.generate_credentials() 77 tokens_pipe = self.get_tokens_pipe() 78 user_id = self.get_user_id(token.user) if token.user is not None else None 79 if user_id is None: 80 return False, "Cannot register a token without a user." 81 82 doc = { 83 'id': token_id, 84 'user_id': user_id, 85 'creation': datetime.now(timezone.utc), 86 'expiration': token.expiration, 87 'label': token.label, 88 'is_valid': token.is_valid, 89 'scopes': list(token.scopes) if token.scopes else [], 90 'secret_hash': hash_password( 91 str(token_secret), 92 rounds=STATIC_CONFIG['tokens']['hash_rounds'] 93 ), 94 } 95 sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug) 96 if not sync_success: 97 return False, f"Failed to register token:\n{sync_msg}" 98 return True, "Success"
Register the new token to the tokens table.
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 102 """ 103 Persist the token's in-memory state to the tokens pipe. 104 """ 105 if not token.id: 106 return False, "Token ID is not set." 107 108 if not token.exists(debug=debug): 109 return False, f"Token {token.id} does not exist." 110 111 if not token.creation: 112 token_model = self.get_token_model(token.id) 113 token.creation = token_model.creation 114 115 tokens_pipe = self.get_tokens_pipe() 116 doc = { 117 'id': token.id, 118 'creation': token.creation, 119 'expiration': token.expiration, 120 'label': token.label, 121 'is_valid': token.is_valid, 122 'scopes': list(token.scopes) if token.scopes else [], 123 } 124 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 125 if not sync_success: 126 return False, f"Failed to edit token '{token.id}':\n{sync_msg}" 127 128 return True, "Success"
Persist the token's in-memory state to the tokens pipe.
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 132 """ 133 Set `is_valid` to `False` for the given token. 134 """ 135 if not token.id: 136 return False, "Token ID is not set." 137 138 if not token.exists(debug=debug): 139 return False, f"Token {token.id} does not exist." 140 141 if not token.creation: 142 token_model = self.get_token_model(token.id) 143 token.creation = token_model.creation 144 145 token.is_valid = False 146 tokens_pipe = self.get_tokens_pipe() 147 doc = { 148 'id': token.id, 149 'creation': token.creation, 150 'is_valid': False, 151 } 152 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 153 if not sync_success: 154 return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}" 155 156 return True, "Success"
Set is_valid
to False
for the given token.
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 160 """ 161 Delete the given token from the tokens table. 162 """ 163 if not token.id: 164 return False, "Token ID is not set." 165 166 if not token.exists(debug=debug): 167 return False, f"Token {token.id} does not exist." 168 169 if not token.creation: 170 token_model = self.get_token_model(token.id) 171 token.creation = token_model.creation 172 173 token.is_valid = False 174 tokens_pipe = self.get_tokens_pipe() 175 clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug) 176 if not clear_success: 177 return False, f"Failed to delete token '{token.id}':\n{clear_msg}" 178 179 return True, "Success"
Delete the given token from the tokens table.
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]: 236 """ 237 Return the `Token` from its ID. 238 """ 239 from meerschaum.utils.misc import is_uuid 240 if isinstance(token_id, str): 241 if is_uuid(token_id): 242 token_id = uuid.UUID(token_id) 243 else: 244 raise ValueError("Invalid token ID.") 245 token_model = self.get_token_model(token_id) 246 if token_model is None: 247 return None 248 return Token(**dict(token_model))
Return the Token
from its ID.
182def get_tokens( 183 self, 184 user: Optional[User] = None, 185 labels: Optional[List[str]] = None, 186 ids: Optional[List[uuid.UUID]] = None, 187 debug: bool = False, 188) -> List[Token]: 189 """ 190 Return a list of `Token` objects. 191 """ 192 tokens_pipe = self.get_tokens_pipe() 193 user_id = ( 194 self.get_user_id(user, debug=debug) 195 if user is not None 196 else None 197 ) 198 user_type = self.get_user_type(user, debug=debug) if user is not None else None 199 params = ( 200 { 201 'user_id': ( 202 user_id 203 if user_type != 'admin' 204 else [user_id, None] 205 ) 206 } 207 if user_id is not None 208 else {} 209 ) 210 if labels: 211 params['label'] = labels 212 if ids: 213 params['id'] = ids 214 215 if debug: 216 dprint(f"Getting tokens with {user_id=}, {params=}") 217 218 tokens_df = tokens_pipe.get_data(params=params, debug=debug) 219 if tokens_df is None: 220 return [] 221 222 if debug: 223 dprint(f"Retrieved tokens dataframe:\n{tokens_df}") 224 225 tokens_docs = tokens_df.to_dict(orient='records') 226 return [ 227 Token( 228 instance=self, 229 **token_doc 230 ) 231 for token_doc in reversed(tokens_docs) 232 ]
Return a list of Token
objects.
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]': 252 """ 253 Return a token's model from the instance. 254 """ 255 from meerschaum.models import TokenModel 256 if isinstance(token_id, Token): 257 token_id = Token.id 258 if not token_id: 259 raise ValueError("Invalid token ID.") 260 tokens_pipe = self.get_tokens_pipe() 261 doc = tokens_pipe.get_doc( 262 params={'id': token_id}, 263 debug=debug, 264 ) 265 if doc is None: 266 return None 267 return TokenModel(**doc)
Return a token's model from the instance.
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]: 271 """ 272 Return the secret hash for a given token. 273 """ 274 if isinstance(token_id, Token): 275 token_id = token_id.id 276 if not token_id: 277 raise ValueError("Invalid token ID.") 278 tokens_pipe = self.get_tokens_pipe() 279 return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)
Return the secret hash for a given token.
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool: 309 """ 310 Return `True` if a token exists in the tokens pipe. 311 """ 312 if isinstance(token_id, Token): 313 token_id = token_id.id 314 if not token_id: 315 raise ValueError("Invalid token ID.") 316 317 tokens_pipe = self.get_tokens_pipe() 318 return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None
Return True
if a token exists in the tokens pipe.
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]: 296 """ 297 Return the scopes for a token. 298 """ 299 if isinstance(token_id, Token): 300 token_id = token_id.id 301 if not token_id: 302 raise ValueError("Invalid token ID.") 303 304 tokens_pipe = self.get_tokens_pipe() 305 return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []
Return the scopes for a token.
17@abc.abstractmethod 18def register_pipe( 19 self, 20 pipe: mrsm.Pipe, 21 debug: bool = False, 22 **kwargs: Any 23) -> mrsm.SuccessTuple: 24 """ 25 Insert the pipe's attributes into the internal `pipes` table. 26 27 Parameters 28 ---------- 29 pipe: mrsm.Pipe 30 The pipe to be registered. 31 32 Returns 33 ------- 34 A `SuccessTuple` of the result. 35 """
Insert the pipe's attributes into the internal pipes
table.
Parameters
- pipe (mrsm.Pipe): The pipe to be registered.
Returns
- A
SuccessTuple
of the result.
37@abc.abstractmethod 38def get_pipe_attributes( 39 self, 40 pipe: mrsm.Pipe, 41 debug: bool = False, 42 **kwargs: Any 43) -> Dict[str, Any]: 44 """ 45 Return the pipe's document from the internal `pipes` table. 46 47 Parameters 48 ---------- 49 pipe: mrsm.Pipe 50 The pipe whose attributes should be retrieved. 51 52 Returns 53 ------- 54 The document that matches the keys of the pipe. 55 """
Return the pipe's document from the internal pipes
table.
Parameters
- pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
- The document that matches the keys of the pipe.
57@abc.abstractmethod 58def get_pipe_id( 59 self, 60 pipe: mrsm.Pipe, 61 debug: bool = False, 62 **kwargs: Any 63) -> Union[str, int, None]: 64 """ 65 Return the `id` for the pipe if it exists. 66 67 Parameters 68 ---------- 69 pipe: mrsm.Pipe 70 The pipe whose `id` to fetch. 71 72 Returns 73 ------- 74 The `id` for the pipe's document or `None`. 75 """
Return the id
for the pipe if it exists.
Parameters
- pipe (mrsm.Pipe):
The pipe whose
id
to fetch.
Returns
- The
id
for the pipe's document orNone
.
77def edit_pipe( 78 self, 79 pipe: mrsm.Pipe, 80 debug: bool = False, 81 **kwargs: Any 82) -> mrsm.SuccessTuple: 83 """ 84 Edit the attributes of the pipe. 85 86 Parameters 87 ---------- 88 pipe: mrsm.Pipe 89 The pipe whose in-memory parameters must be persisted. 90 91 Returns 92 ------- 93 A `SuccessTuple` indicating success. 94 """ 95 raise NotImplementedError
Edit the attributes of the pipe.
Parameters
- pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
- A
SuccessTuple
indicating success.
97def delete_pipe( 98 self, 99 pipe: mrsm.Pipe, 100 debug: bool = False, 101 **kwargs: Any 102) -> mrsm.SuccessTuple: 103 """ 104 Delete a pipe's registration from the `pipes` collection. 105 106 Parameters 107 ---------- 108 pipe: mrsm.Pipe 109 The pipe to be deleted. 110 111 Returns 112 ------- 113 A `SuccessTuple` indicating success. 114 """ 115 raise NotImplementedError
Delete a pipe's registration from the pipes
collection.
Parameters
- pipe (mrsm.Pipe): The pipe to be deleted.
Returns
- A
SuccessTuple
indicating success.
117@abc.abstractmethod 118def fetch_pipes_keys( 119 self, 120 connector_keys: Optional[List[str]] = None, 121 metric_keys: Optional[List[str]] = None, 122 location_keys: Optional[List[str]] = None, 123 tags: Optional[List[str]] = None, 124 debug: bool = False, 125 **kwargs: Any 126) -> List[Tuple[str, str, str]]: 127 """ 128 Return a list of tuples for the registered pipes' keys according to the provided filters. 129 130 Parameters 131 ---------- 132 connector_keys: list[str] | None, default None 133 The keys passed via `-c`. 134 135 metric_keys: list[str] | None, default None 136 The keys passed via `-m`. 137 138 location_keys: list[str] | None, default None 139 The keys passed via `-l`. 140 141 tags: List[str] | None, default None 142 Tags passed via `--tags` which are stored under `parameters:tags`. 143 144 Returns 145 ------- 146 A list of connector, metric, and location keys in tuples. 147 You may return the string "None" for location keys in place of nulls. 148 149 Examples 150 -------- 151 >>> import meerschaum as mrsm 152 >>> conn = mrsm.get_connector('example:demo') 153 >>> 154 >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn) 155 >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn) 156 >>> pipe_a.register() 157 >>> pipe_b.register() 158 >>> 159 >>> conn.fetch_pipes_keys(['a', 'b']) 160 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 161 >>> conn.fetch_pipes_keys(metric_keys=['demo']) 162 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 163 >>> conn.fetch_pipes_keys(tags=['foo']) 164 [('a', 'demo', 'None')] 165 >>> conn.fetch_pipes_keys(location_keys=[None]) 166 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 167 """
Return a list of tuples for the registered pipes' keys according to the provided filters.
Parameters
- connector_keys (list[str] | None, default None):
The keys passed via
-c
. - metric_keys (list[str] | None, default None):
The keys passed via
-m
. - location_keys (list[str] | None, default None):
The keys passed via
-l
. - tags (List[str] | None, default None):
Tags passed via
--tags
which are stored underparameters:tags
.
Returns
- A list of connector, metric, and location keys in tuples.
- You may return the string "None" for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
169@abc.abstractmethod 170def pipe_exists( 171 self, 172 pipe: mrsm.Pipe, 173 debug: bool = False, 174 **kwargs: Any 175) -> bool: 176 """ 177 Check whether a pipe's target table exists. 178 179 Parameters 180 ---------- 181 pipe: mrsm.Pipe 182 The pipe to check whether its table exists. 183 184 Returns 185 ------- 186 A `bool` indicating the table exists. 187 """
Check whether a pipe's target table exists.
Parameters
- pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
- A
bool
indicating the table exists.
189@abc.abstractmethod 190def drop_pipe( 191 self, 192 pipe: mrsm.Pipe, 193 debug: bool = False, 194 **kwargs: Any 195) -> mrsm.SuccessTuple: 196 """ 197 Drop a pipe's collection if it exists. 198 199 Parameters 200 ---------- 201 pipe: mrsm.Pipe 202 The pipe to be dropped. 203 204 Returns 205 ------- 206 A `SuccessTuple` indicating success. 207 """ 208 raise NotImplementedError
Drop a pipe's collection if it exists.
Parameters
- pipe (mrsm.Pipe): The pipe to be dropped.
Returns
- A
SuccessTuple
indicating success.
210def drop_pipe_indices( 211 self, 212 pipe: mrsm.Pipe, 213 debug: bool = False, 214 **kwargs: Any 215) -> mrsm.SuccessTuple: 216 """ 217 Drop a pipe's indices. 218 219 Parameters 220 ---------- 221 pipe: mrsm.Pipe 222 The pipe whose indices need to be dropped. 223 224 Returns 225 ------- 226 A `SuccessTuple` indicating success. 227 """ 228 return False, f"Cannot drop indices for instance connectors of type '{self.type}'."
Drop a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
- A
SuccessTuple
indicating success.
230@abc.abstractmethod 231def sync_pipe( 232 self, 233 pipe: mrsm.Pipe, 234 df: 'pd.DataFrame' = None, 235 begin: Union[datetime, int, None] = None, 236 end: Union[datetime, int, None] = None, 237 chunksize: Optional[int] = -1, 238 check_existing: bool = True, 239 debug: bool = False, 240 **kwargs: Any 241) -> mrsm.SuccessTuple: 242 """ 243 Sync a pipe using a database connection. 244 245 Parameters 246 ---------- 247 pipe: mrsm.Pipe 248 The Meerschaum Pipe instance into which to sync the data. 249 250 df: Optional[pd.DataFrame] 251 An optional DataFrame or equivalent to sync into the pipe. 252 Defaults to `None`. 253 254 begin: Union[datetime, int, None], default None 255 Optionally specify the earliest datetime to search for data. 256 Defaults to `None`. 257 258 end: Union[datetime, int, None], default None 259 Optionally specify the latest datetime to search for data. 260 Defaults to `None`. 261 262 chunksize: Optional[int], default -1 263 Specify the number of rows to sync per chunk. 264 If `-1`, resort to system configuration (default is `900`). 265 A `chunksize` of `None` will sync all rows in one transaction. 266 Defaults to `-1`. 267 268 check_existing: bool, default True 269 If `True`, pull and diff with existing data from the pipe. Defaults to `True`. 270 271 debug: bool, default False 272 Verbosity toggle. Defaults to False. 273 274 Returns 275 ------- 276 A `SuccessTuple` of success (`bool`) and message (`str`). 277 """
Sync a pipe using a database connection.
Parameters
- pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
- df (Optional[pd.DataFrame]):
An optional DataFrame or equivalent to sync into the pipe.
Defaults to
None
. - begin (Union[datetime, int, None], default None):
Optionally specify the earliest datetime to search for data.
Defaults to
None
. - end (Union[datetime, int, None], default None):
Optionally specify the latest datetime to search for data.
Defaults to
None
. - chunksize (Optional[int], default -1):
Specify the number of rows to sync per chunk.
If
-1
, resort to system configuration (default is900
). Achunksize
ofNone
will sync all rows in one transaction. Defaults to-1
. - check_existing (bool, default True):
If
True
, pull and diff with existing data from the pipe. Defaults toTrue
. - debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTuple
of success (bool
) and message (str
).
279def create_pipe_indices( 280 self, 281 pipe: mrsm.Pipe, 282 debug: bool = False, 283 **kwargs: Any 284) -> mrsm.SuccessTuple: 285 """ 286 Create a pipe's indices. 287 288 Parameters 289 ---------- 290 pipe: mrsm.Pipe 291 The pipe whose indices need to be created. 292 293 Returns 294 ------- 295 A `SuccessTuple` indicating success. 296 """ 297 return False, f"Cannot create indices for instance connectors of type '{self.type}'."
Create a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
- A
SuccessTuple
indicating success.
299def clear_pipe( 300 self, 301 pipe: mrsm.Pipe, 302 begin: Union[datetime, int, None] = None, 303 end: Union[datetime, int, None] = None, 304 params: Optional[Dict[str, Any]] = None, 305 debug: bool = False, 306 **kwargs: Any 307) -> mrsm.SuccessTuple: 308 """ 309 Delete rows within `begin`, `end`, and `params`. 310 311 Parameters 312 ---------- 313 pipe: mrsm.Pipe 314 The pipe whose rows to clear. 315 316 begin: datetime | int | None, default None 317 If provided, remove rows >= `begin`. 318 319 end: datetime | int | None, default None 320 If provided, remove rows < `end`. 321 322 params: dict[str, Any] | None, default None 323 If provided, only remove rows which match the `params` filter. 324 325 Returns 326 ------- 327 A `SuccessTuple` indicating success. 328 """ 329 raise NotImplementedError
Delete rows within begin
, end
, and params
.
Parameters
- pipe (mrsm.Pipe): The pipe whose rows to clear.
- begin (datetime | int | None, default None):
If provided, remove rows >=
begin
. - end (datetime | int | None, default None):
If provided, remove rows <
end
. - params (dict[str, Any] | None, default None):
If provided, only remove rows which match the
params
filter.
Returns
- A
SuccessTuple
indicating success.
331@abc.abstractmethod 332def get_pipe_data( 333 self, 334 pipe: mrsm.Pipe, 335 select_columns: Optional[List[str]] = None, 336 omit_columns: Optional[List[str]] = None, 337 begin: Union[datetime, int, None] = None, 338 end: Union[datetime, int, None] = None, 339 params: Optional[Dict[str, Any]] = None, 340 debug: bool = False, 341 **kwargs: Any 342) -> Union['pd.DataFrame', None]: 343 """ 344 Query a pipe's target table and return the DataFrame. 345 346 Parameters 347 ---------- 348 pipe: mrsm.Pipe 349 The pipe with the target table from which to read. 350 351 select_columns: list[str] | None, default None 352 If provided, only select these given columns. 353 Otherwise select all available columns (i.e. `SELECT *`). 354 355 omit_columns: list[str] | None, default None 356 If provided, remove these columns from the selection. 357 358 begin: datetime | int | None, default None 359 The earliest `datetime` value to search from (inclusive). 360 361 end: datetime | int | None, default None 362 The lastest `datetime` value to search from (exclusive). 363 364 params: dict[str | str] | None, default None 365 Additional filters to apply to the query. 366 367 Returns 368 ------- 369 The target table's data as a DataFrame. 370 """
Query a pipe's target table and return the DataFrame.
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *
). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetime
value to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetime
value to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a DataFrame.
372@abc.abstractmethod 373def get_sync_time( 374 self, 375 pipe: mrsm.Pipe, 376 params: Optional[Dict[str, Any]] = None, 377 newest: bool = True, 378 debug: bool = False, 379 **kwargs: Any 380) -> datetime | int | None: 381 """ 382 Return the most recent value for the `datetime` axis. 383 384 Parameters 385 ---------- 386 pipe: mrsm.Pipe 387 The pipe whose collection contains documents. 388 389 params: dict[str, Any] | None, default None 390 Filter certain parameters when determining the sync time. 391 392 newest: bool, default True 393 If `True`, return the maximum value for the column. 394 395 Returns 396 ------- 397 The largest `datetime` or `int` value of the `datetime` axis. 398 """
Return the most recent value for the datetime
axis.
Parameters
- pipe (mrsm.Pipe): The pipe whose collection contains documents.
- params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
- newest (bool, default True):
If
True
, return the maximum value for the column.
Returns
- The largest
datetime
orint
value of thedatetime
axis.
400@abc.abstractmethod 401def get_pipe_columns_types( 402 self, 403 pipe: mrsm.Pipe, 404 debug: bool = False, 405 **kwargs: Any 406) -> Dict[str, str]: 407 """ 408 Return the data types for the columns in the target table for data type enforcement. 409 410 Parameters 411 ---------- 412 pipe: mrsm.Pipe 413 The pipe whose target table contains columns and data types. 414 415 Returns 416 ------- 417 A dictionary mapping columns to data types. 418 """
Return the data types for the columns in the target table for data type enforcement.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
- A dictionary mapping columns to data types.
420def get_pipe_columns_indices( 421 self, 422 debug: bool = False, 423) -> Dict[str, List[Dict[str, str]]]: 424 """ 425 Return a dictionary mapping columns to metadata about related indices. 426 427 Parameters 428 ---------- 429 pipe: mrsm.Pipe 430 The pipe whose target table has related indices. 431 432 Returns 433 ------- 434 A list of dictionaries with the keys "type" and "name". 435 436 Examples 437 -------- 438 >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']}) 439 >>> pipe.sync([{'color': 'red', 'size': 'M'}]) 440 >>> pipe.get_columns_indices() 441 {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]} 442 """ 443 return {}
Return a dictionary mapping columns to metadata about related indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
- A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
279def make_connector(cls, _is_executor: bool = False): 280 """ 281 Register a class as a `Connector`. 282 The `type` will be the lower case of the class name, without the suffix `connector`. 283 284 Parameters 285 ---------- 286 instance: bool, default False 287 If `True`, make this connector type an instance connector. 288 This requires implementing the various pipes functions and lots of testing. 289 290 Examples 291 -------- 292 >>> import meerschaum as mrsm 293 >>> from meerschaum.connectors import make_connector, Connector 294 >>> 295 >>> @make_connector 296 >>> class FooConnector(Connector): 297 ... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password'] 298 ... 299 >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat') 300 >>> print(conn.username, conn.password) 301 dog cat 302 >>> 303 """ 304 import re 305 from meerschaum.plugins import _get_parent_plugin 306 suffix_regex = ( 307 r'connector$' 308 if not _is_executor 309 else r'executor$' 310 ) 311 plugin_name = _get_parent_plugin(2) 312 typ = re.sub(suffix_regex, '', cls.__name__.lower()) 313 with _locks['types']: 314 types[typ] = cls 315 with _locks['custom_types']: 316 custom_types.add(typ) 317 if plugin_name: 318 with _locks['plugins_types']: 319 if plugin_name not in plugins_types: 320 plugins_types[plugin_name] = [] 321 plugins_types[plugin_name].append(typ) 322 with _locks['connectors']: 323 if typ not in connectors: 324 connectors[typ] = {} 325 if getattr(cls, 'IS_INSTANCE', False): 326 with _locks['instance_types']: 327 if typ not in instance_types: 328 instance_types.append(typ) 329 330 return cls
Register a class as a Connector
.
The type
will be the lower case of the class name, without the suffix connector
.
Parameters
- instance (bool, default False):
If
True
, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>>
>>> @make_connector
>>> class FooConnector(Connector):
... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
...
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
52def entry( 53 sysargs: Union[List[str], str, None] = None, 54 _patch_args: Optional[Dict[str, Any]] = None, 55 _use_cli_daemon: bool = True, 56 _session_id: Optional[str] = None, 57) -> SuccessTuple: 58 """ 59 Parse arguments and launch a Meerschaum action. 60 61 Returns 62 ------- 63 A `SuccessTuple` indicating success. 64 """ 65 start = time.perf_counter() 66 from meerschaum.config.environment import get_daemon_env_vars 67 sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs 68 if ( 69 not _use_cli_daemon 70 or (not sysargs or (sysargs[0] and sysargs[0].startswith('-'))) 71 or '--no-daemon' in sysargs_list 72 or '--daemon' in sysargs_list 73 or '-d' in sysargs_list 74 or get_daemon_env_vars() 75 or not mrsm.get_config('system', 'experimental', 'cli_daemon') 76 ): 77 success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args) 78 end = time.perf_counter() 79 if '--debug' in sysargs_list: 80 print(f"Duration without daemon: {round(end - start, 3)}") 81 return success, msg 82 83 from meerschaum._internal.cli.entry import entry_with_daemon 84 success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args) 85 end = time.perf_counter() 86 if '--debug' in sysargs_list: 87 print(f"Duration with daemon: {round(end - start, 3)}") 88 return success, msg