meerschaum

Meerschaum Python API
Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.
Root Module
For your convenience, the following classes and functions may be imported from the root meerschaum namespace:
Classes
Examples
Build a Connector
Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:
import meerschaum as mrsm
sql_conn = mrsm.get_connector(
'sql:temp',
flavor='sqlite',
database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
# foo
# 0 1
sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
# foo
# 0 1
Create a Custom Connector Class
Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:
from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time
@mrsm.make_connector
class FooConnector(mrsm.Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']
def fetch(
self,
begin: datetime | None = None,
end: datetime | None = None,
):
now = begin or round_time(datetime.now(timezone.utc))
return [
{'ts': now, 'id': 1, 'vl': randint(1, 100)},
{'ts': now, 'id': 2, 'vl': randint(1, 100)},
{'ts': now, 'id': 3, 'vl': randint(1, 100)},
]
foo_conn = mrsm.get_connector(
'foo:bar',
username='foo',
password='bar',
)
docs = foo_conn.fetch()
Build a Pipe
Build a meerschaum.Pipe in-memory:
from datetime import datetime
import meerschaum as mrsm
pipe = mrsm.Pipe(
foo_conn, 'demo',
instance=sql_conn,
columns={'datetime': 'ts', 'id': 'id'},
tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
# ts id vl
# 0 2024-01-01 1 97
# 1 2024-01-01 2 18
# 2 2024-01-01 3 96
Add temporary=True to skip registering the pipe in the pipes table.
Get Registered Pipes
The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:
import meerschaum as mrsm
pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]
Add as_list=True to flatten the hierarchy:
import meerschaum as mrsm
pipes = mrsm.get_pipes(
tags=['production'],
instance=sql_conn,
as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]
Filter by the dtype of the datetime index column with datetime_dtypes. Accepted values are 'datetime', 'int', and 'None'; prefix with '_' to negate:
import meerschaum as mrsm
### Only pipes with a timestamp datetime index:
timestamp_pipes = mrsm.get_pipes(datetime_dtypes=['datetime'], as_list=True)
### Only pipes with an integer datetime index:
int_pipes = mrsm.get_pipes(datetime_dtypes=['int'], as_list=True)
### Exclude pipes without a datetime index:
datetime_pipes = mrsm.get_pipes(datetime_dtypes=['_None'], as_list=True)
Import Plugins
You can import a plugin's module through meerschaum.Plugin.module:
import meerschaum as mrsm
plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
noaa = plugin.module
If your plugin has submodules, use meerschaum.plugins.from_plugin_import:
from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')
Import multiple plugins with meerschaum.plugins.import_plugins:
from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')
Create a Job
Create a meerschaum.Job with name and sysargs:
import meerschaum as mrsm
job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()
Pass executor_keys as the connectors keys of an API instance to create a remote job:
import meerschaum as mrsm
job = mrsm.Job(
'foo',
'sync pipes -s daily',
executor_keys='api:main',
)
Import from a Virtual Environment
Use the meerschaum.Venv context manager to activate a virtual environment:
import meerschaum as mrsm
with mrsm.Venv('noaa'):
import requests
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
To import packages which may not be installed, use meerschaum.attempt_import():
import meerschaum as mrsm
requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
Run Actions
Run sysargs with meerschaum.entry():
import meerschaum as mrsm
success, msg = mrsm.entry('show pipes + show version : x2')
Use meerschaum.actions.get_action() to access an action function directly:
from meerschaum.actions import get_action
show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])
Get a dictionary of available subactions with meerschaum.actions.get_subactions():
from meerschaum.actions import get_subactions
subactions = get_subactions('show')
success, msg = subactions['pipes']()
Create a Plugin
Run bootstrap plugin to create a new plugin:
mrsm bootstrap plugin example
This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.
Open your plugin with edit plugin:
mrsm edit plugin example
Run edit plugin and paste the example code below to try out the features.
See the writing plugins guide for more in-depth documentation.
Create a Custom Action
Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:
from meerschaum.actions import make_action
@make_action
def sing():
print('What would you like me to sing?')
return True, "Success"
def sing_tune():
return False, "I don't know that song!"
def sing_song():
print('Hello, World!')
return True, "Success"
Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:
from meerschaum.plugins import make_action, add_plugin_argument
add_plugin_argument(
'--song', type=str, help='What song to sing.',
)
@make_action
def sing_melody(action=None, song=None):
to_sing = action[0] if action else song
if not to_sing:
return False, "Please tell me what to sing!"
return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala
mrsm sing melody --song do-re-mi
Add a Page to the Web Dashboard
Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:
from meerschaum.plugins import dash_plugin, web_page
@dash_plugin
def init_dash(dash_app):
import dash.html as html
import dash_bootstrap_components as dbc
from dash import Input, Output, no_update
### Routes to '/dash/my-page'
@web_page('/my-page', login_required=False)
def my_page():
return dbc.Container([
html.H1("Hello, World!"),
dbc.Button("Click me", id='my-button'),
html.Div(id="my-output-div"),
])
@dash_app.callback(
Output('my-output-div', 'children'),
Input('my-button', 'n_clicks'),
)
def my_button_click(n_clicks):
if not n_clicks:
return no_update
return html.P(f'You clicked {n_clicks} times!')
Submodules
meerschaum.actions
Access functions for actions and subactions.
meerschaum.actions.actionsmeerschaum.actions.get_action()meerschaum.actions.get_completer()meerschaum.actions.get_main_action_name()meerschaum.actions.get_subactions()
meerschaum.config
Read and write the Meerschaum configuration registry.
meerschaum.config.get_config()meerschaum.config.get_plugin_config()meerschaum.config.write_config()meerschaum.config.write_plugin_config()meerschaum.config.environment
Patch configuration and connectors from environment variables.
meerschaum.config.environment.apply_environment_patches()meerschaum.config.environment.apply_environment_config()meerschaum.config.environment.apply_environment_uris()meerschaum.config.environment.apply_connector_uri()meerschaum.config.environment.get_connector_env_regex()meerschaum.config.environment.get_connector_env_vars()meerschaum.config.environment.get_env_vars()meerschaum.config.environment.get_daemon_env_vars()meerschaum.config.environment.replace_env()
meerschaum.connectors
Build connectors to interact with databases and fetch data.
meerschaum.connectors.get_connector()meerschaum.connectors.make_connector()meerschaum.connectors.is_connected()meerschaum.connectors.poll.retry_connect()meerschaum.connectors.Connectormeerschaum.connectors.sql.SQLConnectormeerschaum.connectors.api.APIConnectormeerschaum.connectors.valkey.ValkeyConnector
meerschaum.jobs
Start background jobs.
meerschaum.jobs.Jobmeerschaum.jobs.Executormeerschaum.jobs.systemd.SystemdExecutormeerschaum.jobs.get_jobs()meerschaum.jobs.get_filtered_jobs()meerschaum.jobs.get_running_jobs()meerschaum.jobs.get_stopped_jobs()meerschaum.jobs.get_paused_jobs()meerschaum.jobs.get_restart_jobs()meerschaum.jobs.make_executor()meerschaum.jobs.check_restart_jobs()meerschaum.jobs.start_check_jobs_thread()meerschaum.jobs.stop_check_jobs_thread()meerschaum.jobs.get_executor_keys_from_context()
meerschaum.plugins
Access plugin modules and other API utilties.
meerschaum.plugins.Pluginmeerschaum.plugins.api_plugin()meerschaum.plugins.dash_plugin()meerschaum.plugins.import_plugins()meerschaum.plugins.reload_plugins()meerschaum.plugins.get_plugins()meerschaum.plugins.get_data_plugins()meerschaum.plugins.add_plugin_argument()meerschaum.plugins.pre_sync_hook()meerschaum.plugins.post_sync_hook()
meerschaum.utils
Utility functions are available in several submodules:
meerschaum.utils.daemon.daemon_entry()meerschaum.utils.daemon.daemon_action()meerschaum.utils.daemon.get_daemons()meerschaum.utils.daemon.get_daemon_ids()meerschaum.utils.daemon.get_running_daemons()meerschaum.utils.daemon.get_paused_daemons()meerschaum.utils.daemon.get_stopped_daemons()meerschaum.utils.daemon.get_filtered_daemons()meerschaum.utils.daemon.run_daemon()meerschaum.utils.daemon.Daemonmeerschaum.utils.daemon.FileDescriptorInterceptormeerschaum.utils.daemon.RotatingFile
meerschaum.utils.daemon
Manage background jobs.
meerschaum.utils.dataframe.add_missing_cols_to_df()meerschaum.utils.dataframe.chunksize_to_npartitions()meerschaum.utils.dataframe.df_from_literal()meerschaum.utils.dataframe.df_is_chunk_generator()meerschaum.utils.dataframe.enforce_dtypes()meerschaum.utils.dataframe.filter_unseen_df()meerschaum.utils.dataframe.get_bool_cols()meerschaum.utils.dataframe.get_bytes_cols()meerschaum.utils.dataframe.get_datetime_bound_from_df()meerschaum.utils.dataframe.get_date_cols()meerschaum.utils.dataframe.get_datetime_cols()meerschaum.utils.dataframe.get_datetime_cols_types()meerschaum.utils.dataframe.get_first_valid_dask_partition()meerschaum.utils.dataframe.get_geometry_cols()meerschaum.utils.dataframe.get_geometry_cols_types()meerschaum.utils.dataframe.get_json_cols()meerschaum.utils.dataframe.get_numeric_cols()meerschaum.utils.dataframe.get_special_cols()meerschaum.utils.dataframe.get_unhashable_cols()meerschaum.utils.dataframe.get_unique_index_values()meerschaum.utils.dataframe.get_uuid_cols()meerschaum.utils.dataframe.parse_df_datetimes()meerschaum.utils.dataframe.query_df()meerschaum.utils.dataframe.to_json()meerschaum.utils.dataframe.to_simple_lines()meerschaum.utils.dataframe.parse_simple_lines()
meerschaum.utils.dataframe
Manipulate dataframes.
meerschaum.utils.dtypes.are_dtypes_equal()meerschaum.utils.dtypes.attempt_cast_to_bytes()meerschaum.utils.dtypes.attempt_cast_to_geometry()meerschaum.utils.dtypes.attempt_cast_to_numeric()meerschaum.utils.dtypes.attempt_cast_to_uuid()meerschaum.utils.dtypes.coerce_timezone()meerschaum.utils.dtypes.deserialize_base64()meerschaum.utils.dtypes.deserialize_bytes_string()meerschaum.utils.dtypes.deserialize_geometry()meerschaum.utils.dtypes.encode_bytes_for_bytea()meerschaum.utils.dtypes.geometry_is_gpkg()meerschaum.utils.dtypes.geometry_is_wkt()meerschaum.utils.dtypes.get_current_timestamp()meerschaum.utils.dtypes.get_geometry_type_srid()meerschaum.utils.dtypes.is_dtype_numeric()meerschaum.utils.dtypes.is_dtype_special()meerschaum.utils.dtypes.json_serialize_value()meerschaum.utils.dtypes.none_if_null()meerschaum.utils.dtypes.project_geometry()meerschaum.utils.dtypes.quantize_decimal()meerschaum.utils.dtypes.serialize_bytes()meerschaum.utils.dtypes.serialize_datetime()meerschaum.utils.dtypes.serialize_date()meerschaum.utils.dtypes.serialize_decimal()meerschaum.utils.dtypes.serialize_geometry()meerschaum.utils.dtypes.to_datetime()meerschaum.utils.dtypes.to_pandas_dtype()meerschaum.utils.dtypes.value_is_null()meerschaum.utils.dtypes.get_next_precision_unit()meerschaum.utils.dtypes.round_time()
meerschaum.utils.dtypes
Work with data types.
meerschaum.utils.formatting.colored()meerschaum.utils.formatting.extract_stats_from_message()meerschaum.utils.formatting.fill_ansi()meerschaum.utils.formatting.get_console()meerschaum.utils.formatting.highlight_pipes()meerschaum.utils.formatting.make_header()meerschaum.utils.formatting.pipe_repr()meerschaum.utils.formatting.pprint()meerschaum.utils.formatting.pprint_pipes()meerschaum.utils.formatting.print_options()meerschaum.utils.formatting.print_pipes_results()meerschaum.utils.formatting.print_tuple()meerschaum.utils.formatting.translate_rich_to_termcolor()
meerschaum.utils.formatting
Format output text.
meerschaum.utils.misc.items_str()meerschaum.utils.misc.is_int()meerschaum.utils.misc.is_uuid()meerschaum.utils.misc.interval_str()meerschaum.utils.misc.filter_keywords()meerschaum.utils.misc.generate_password()meerschaum.utils.misc.string_to_dict()meerschaum.utils.misc.iterate_chunks()meerschaum.utils.misc.timed_input()meerschaum.utils.misc.replace_pipes_in_dict()meerschaum.utils.misc.is_valid_email()meerschaum.utils.misc.string_width()meerschaum.utils.misc.replace_password()meerschaum.utils.misc.parse_config_substitution()meerschaum.utils.misc.edit_file()meerschaum.utils.misc.get_in_ex_params()meerschaum.utils.misc.separate_negation_values()meerschaum.utils.misc.flatten_list()meerschaum.utils.misc.make_symlink()meerschaum.utils.misc.is_symlink()meerschaum.utils.misc.wget()meerschaum.utils.misc.add_method_to_class()meerschaum.utils.misc.is_pipe_registered()meerschaum.utils.misc.get_cols_lines()meerschaum.utils.misc.sorted_dict()meerschaum.utils.misc.flatten_pipes_dict()meerschaum.utils.misc.dict_from_od()meerschaum.utils.misc.remove_ansi()meerschaum.utils.misc.get_connector_labels()meerschaum.utils.misc.json_serialize_datetime()meerschaum.utils.misc.async_wrap()meerschaum.utils.misc.is_docker_available()meerschaum.utils.misc.is_android()meerschaum.utils.misc.is_bcp_available()meerschaum.utils.misc.truncate_string_sections()meerschaum.utils.misc.safely_extract_tar()
meerschaum.utils.misc
Miscellaneous utility functions.
meerschaum.utils.packages.attempt_import()meerschaum.utils.packages.get_module_path()meerschaum.utils.packages.manually_import_module()meerschaum.utils.packages.get_install_no_version()meerschaum.utils.packages.determine_version()meerschaum.utils.packages.need_update()meerschaum.utils.packages.get_pip()meerschaum.utils.packages.pip_install()meerschaum.utils.packages.pip_uninstall()meerschaum.utils.packages.completely_uninstall_package()meerschaum.utils.packages.run_python_package()meerschaum.utils.packages.lazy_import()meerschaum.utils.packages.pandas_name()meerschaum.utils.packages.import_pandas()meerschaum.utils.packages.import_rich()meerschaum.utils.packages.import_dcc()meerschaum.utils.packages.import_html()meerschaum.utils.packages.get_modules_from_package()meerschaum.utils.packages.import_children()meerschaum.utils.packages.reload_package()meerschaum.utils.packages.reload_meerschaum()meerschaum.utils.packages.is_installed()meerschaum.utils.packages.venv_contains_package()meerschaum.utils.packages.package_venv()meerschaum.utils.packages.ensure_readline()meerschaum.utils.packages.get_prerelease_dependencies()
meerschaum.utils.packages
Manage Python packages.
meerschaum.utils.pipes
Utilities for working with pipe objects.
meerschaum.utils.sql.build_where()meerschaum.utils.sql.clean()meerschaum.utils.sql.get_sqlalchemy_table()meerschaum.utils.sql.dateadd_str()meerschaum.utils.sql.test_connection()meerschaum.utils.sql.get_distinct_col_count()meerschaum.utils.sql.sql_item_name()meerschaum.utils.sql.pg_capital()meerschaum.utils.sql.oracle_capital()meerschaum.utils.sql.truncate_item_name()meerschaum.utils.sql.table_exists()meerschaum.utils.sql.get_table_cols_types()meerschaum.utils.sql.get_table_cols_indices()meerschaum.utils.sql.get_update_queries()meerschaum.utils.sql.get_null_replacement()meerschaum.utils.sql.get_db_version()meerschaum.utils.sql.get_rename_table_queries()meerschaum.utils.sql.get_create_table_queries()meerschaum.utils.sql.wrap_query_with_cte()meerschaum.utils.sql.format_cte_subquery()meerschaum.utils.sql.session_execute()meerschaum.utils.sql.get_reset_autoincrement_queries()meerschaum.utils.sql.get_postgis_geo_columns_types()meerschaum.utils.sql.get_create_schema_if_not_exists_queries()
meerschaum.utils.sql
Build SQL queries.
meerschaum.utils.venv.Venvmeerschaum.utils.venv.activate_venv()meerschaum.utils.venv.deactivate_venv()meerschaum.utils.venv.get_module_venv()meerschaum.utils.venv.get_venvs()meerschaum.utils.venv.init_venv()meerschaum.utils.venv.inside_venv()meerschaum.utils.venv.is_venv_active()meerschaum.utils.venv.venv_exec()meerschaum.utils.venv.venv_executable()meerschaum.utils.venv.venv_exists()meerschaum.utils.venv.venv_target_path()meerschaum.utils.venv.verify_venv()
meerschaum.utils.venv
Manage virtual environments.
meerschaum.utils.warnings
Print warnings, errors, info, and debug messages.
1#! /usr/bin/env python 2# -*- coding: utf-8 -*- 3# vim:fenc=utf-8 4 5""" 6Copyright 2020–2026 Bennett Meares 7 8Licensed under the Apache License, Version 2.0 (the "License"); 9you may not use this file except in compliance with the License. 10You may obtain a copy of the License at 11 12 http://www.apache.org/licenses/LICENSE-2.0 13 14Unless required by applicable law or agreed to in writing, software 15distributed under the License is distributed on an "AS IS" BASIS, 16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17See the License for the specific language governing permissions and 18limitations under the License. 19""" 20 21import atexit 22 23from meerschaum.utils.typing import SuccessTuple 24from meerschaum.utils.packages import attempt_import 25from meerschaum.core.Pipe import Pipe 26from meerschaum.plugins import Plugin 27from meerschaum.utils.venv import Venv 28from meerschaum.jobs import Job, make_executor 29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector 30from meerschaum.utils import get_pipes 31from meerschaum.utils.formatting import pprint 32from meerschaum._internal.docs import index as __doc__ 33from meerschaum.config import __version__, get_config 34from meerschaum._internal.entry import entry 35from meerschaum.__main__ import _close_pools 36 37atexit.register(_close_pools) 38 39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False} 40__all__ = ( 41 "get_pipes", 42 "get_connector", 43 "get_config", 44 "Pipe", 45 "Plugin", 46 "SuccessTuple", 47 "Venv", 48 "Plugin", 49 "Job", 50 "pprint", 51 "attempt_import", 52 "actions", 53 "config", 54 "connectors", 55 "jobs", 56 "plugins", 57 "utils", 58 "SuccessTuple", 59 "Connector", 60 "InstanceConnector", 61 "make_connector", 62 "entry", 63)
29def get_pipes( 30 connector_keys: Union[str, List[str], None] = None, 31 metric_keys: Union[str, List[str], None] = None, 32 location_keys: Union[str, List[str], None] = None, 33 tags: Optional[List[str]] = None, 34 datetime_dtypes: Optional[List[str]] = None, 35 params: Optional[Dict[str, Any]] = None, 36 mrsm_instance: Union[str, InstanceConnector, None] = None, 37 instance: Union[str, InstanceConnector, None] = None, 38 as_list: bool = False, 39 as_tags_dict: bool = False, 40 method: str = 'registered', 41 workers: Optional[int] = None, 42 debug: bool = False, 43 _cache_parameters: bool = True, 44 **kw: Any 45) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]: 46 """ 47 Return a dictionary or list of `meerschaum.Pipe` objects. 48 49 Parameters 50 ---------- 51 connector_keys: Union[str, List[str], None], default None 52 String or list of connector keys. 53 If omitted or is `'*'`, fetch all possible keys. 54 If a string begins with `'_'`, select keys that do NOT match the string. 55 56 metric_keys: Union[str, List[str], None], default None 57 String or list of metric keys. See `connector_keys` for formatting. 58 59 location_keys: Union[str, List[str], None], default None 60 String or list of location keys. See `connector_keys` for formatting. 61 62 tags: Optional[List[str]], default None 63 If provided, only include pipes with these tags. 64 65 datetime_dtypes: Optional[List[str]], default None 66 If provided, only include pipes with the corresponding `datetime` axis dtypes. 67 Accepted values are `datetime`, `int`, `None` (or `null`, etc.). 68 May be negated by `_`. 69 70 params: Optional[Dict[str, Any]], default None 71 Dictionary of additional parameters to search by. 72 Params are parsed into a SQL WHERE clause. 73 E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'` 74 75 mrsm_instance: Union[str, InstanceConnector, None], default None 76 Connector keys for the Meerschaum instance of the pipes. 77 Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or 78 `meerschaum.connectors.api.APIConnector.APIConnector`. 79 80 as_list: bool, default False 81 If `True`, return pipes in a list instead of a hierarchical dictionary. 82 `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}` 83 `True` : `[Pipe]` 84 85 as_tags_dict: bool, default False 86 If `True`, return a dictionary mapping tags to pipes. 87 Pipes with multiple tags will be repeated. 88 89 method: str, default 'registered' 90 Available options: `['registered', 'explicit', 'all']` 91 If `'registered'` (default), create pipes based on registered keys in the connector's pipes table 92 (API or SQL connector, depends on mrsm_instance). 93 If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys 94 instead of consulting the pipes table. Useful for creating non-existent pipes. 95 If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`. 96 **NOTE:** Method `'all'` is not implemented! 97 98 workers: Optional[int], default None 99 If provided (and `as_tags_dict` is `True`), set the number of workers for the pool 100 to fetch tags. 101 Only takes effect if the instance connector supports multi-threading 102 103 **kw: Any: 104 Keyword arguments to pass to the `meerschaum.Pipe` constructor. 105 106 Returns 107 ------- 108 A dictionary of dictionaries and `meerschaum.Pipe` objects 109 in the connector, metric, location hierarchy. 110 If `as_list` is `True`, return a list of `meerschaum.Pipe` objects. 111 If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes. 112 113 Examples 114 -------- 115 ``` 116 >>> ### Manual definition: 117 >>> pipes = { 118 ... <connector_keys>: { 119 ... <metric_key>: { 120 ... <location_key>: Pipe( 121 ... <connector_keys>, 122 ... <metric_key>, 123 ... <location_key>, 124 ... ), 125 ... }, 126 ... }, 127 ... }, 128 >>> ### Accessing a single pipe: 129 >>> pipes['sql:main']['weather'][None] 130 >>> ### Return a list instead: 131 >>> get_pipes(as_list=True) 132 [Pipe('sql:main', 'weather')] 133 >>> get_pipes(as_tags_dict=True) 134 {'gvl': Pipe('sql:main', 'weather')} 135 ``` 136 """ 137 import json 138 from collections import defaultdict 139 from meerschaum.config import get_config 140 from meerschaum.config.static import STATIC_CONFIG 141 from meerschaum.utils.warnings import error 142 from meerschaum.utils.misc import filter_keywords, separate_negation_values 143 from meerschaum.utils.pool import get_pool 144 from meerschaum.utils.pipes import replace_pipes_syntax 145 from meerschaum.utils.debug import dprint 146 from meerschaum.utils.dtypes import value_is_null, get_current_timestamp 147 from meerschaum import Pipe 148 149 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 150 if datetime_dtypes: 151 if isinstance(datetime_dtypes, str): 152 datetime_dtypes = [datetime_dtypes] 153 for _dt in datetime_dtypes: 154 _clean = str(_dt).lstrip(negation_prefix).lower() 155 if _clean not in ('datetime', 'int') and not value_is_null(_clean): 156 error(f"Invalid datetime dtype '{_dt}'.") 157 158 if connector_keys is None: 159 connector_keys = [] 160 if metric_keys is None: 161 metric_keys = [] 162 if location_keys is None: 163 location_keys = [] 164 if params is None: 165 params = {} 166 if tags is None: 167 tags = [] 168 169 if isinstance(connector_keys, str): 170 connector_keys = [connector_keys] 171 if isinstance(metric_keys, str): 172 metric_keys = [metric_keys] 173 if isinstance(location_keys, str): 174 location_keys = [location_keys] 175 176 ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`). 177 if mrsm_instance is None: 178 mrsm_instance = instance 179 if mrsm_instance is None: 180 mrsm_instance = get_config('meerschaum', 'instance', patch=True) 181 if isinstance(mrsm_instance, str): 182 from meerschaum.connectors.parse import parse_instance_keys 183 connector = parse_instance_keys(keys=mrsm_instance, debug=debug) 184 else: 185 from meerschaum.connectors import instance_types 186 valid_connector = False 187 if hasattr(mrsm_instance, 'type'): 188 if mrsm_instance.type in instance_types: 189 valid_connector = True 190 if not valid_connector: 191 error(f"Invalid instance connector: {mrsm_instance}") 192 connector = mrsm_instance 193 if debug: 194 dprint(f"Using instance connector: {connector}") 195 if not connector: 196 error(f"Could not create connector from keys: '{mrsm_instance}'") 197 198 ### Get a list of tuples for the keys needed to build pipes. 199 result = fetch_pipes_keys( 200 method, 201 connector, 202 connector_keys = connector_keys, 203 metric_keys = metric_keys, 204 location_keys = location_keys, 205 tags = tags, 206 params = params, 207 workers = workers, 208 debug = debug 209 ) 210 if result is None: 211 error("Unable to build pipes!") 212 result_items: List[Tuple] = ( 213 list(result.items()) 214 if isinstance(result, dict) 215 else [(None, keys_tuple) for keys_tuple in result] 216 ) 217 218 ### Populate the `pipes` dictionary with Pipes based on the keys 219 ### obtained from the chosen `method`. 220 in_dtypes, ex_dtypes = separate_negation_values(datetime_dtypes or []) 221 pipes: PipesDict = {} 222 for pipe_id, keys_tuple in result_items: 223 ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2] 224 pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None 225 pipe_parameters = ( 226 pipe_tags_or_parameters 227 if isinstance(pipe_tags_or_parameters, (dict, str)) 228 else None 229 ) 230 if isinstance(pipe_parameters, str): 231 pipe_parameters = json.loads(pipe_parameters) 232 pipe_tags = ( 233 pipe_tags_or_parameters 234 if isinstance(pipe_tags_or_parameters, list) 235 else ( 236 pipe_tags_or_parameters.get('tags', []) 237 if isinstance(pipe_tags_or_parameters, dict) 238 else None 239 ) 240 ) 241 242 pipe = Pipe( 243 ck, mk, lk, 244 mrsm_instance = connector, 245 parameters = pipe_parameters, 246 tags = pipe_tags, 247 debug = debug, 248 **filter_keywords(Pipe, **kw) 249 ) 250 pipe.__dict__['_tags'] = pipe_tags 251 if pipe_id is not None: 252 pipe._cache_value('_id', pipe_id, memory_only=True, debug=debug) 253 if pipe_parameters is not None: 254 now = get_current_timestamp('ms', as_int=True) / 1000 255 full_attributes = { 256 'connector_keys': ck, 257 'metric_key': mk, 258 'location_key': lk, 259 'parameters': pipe_parameters, 260 } 261 if pipe_id is not None: 262 full_attributes['pipe_id'] = pipe_id 263 pipe._cache_value('attributes', full_attributes, memory_only=True, debug=debug) 264 pipe._cache_value('_attributes_sync_time', now, memory_only=True, debug=debug) 265 if datetime_dtypes: 266 if pipe_parameters is None: 267 pipe_parameters = pipe.get_parameters(debug=debug) 268 columns_val = (pipe_parameters or {}).get('columns', {}) or {} 269 if isinstance(columns_val, str) and 'Pipe(' in columns_val: 270 columns_val = replace_pipes_syntax(columns_val) 271 272 dt_col = columns_val.get('datetime', None) 273 dt_typ = ( 274 ((pipe_parameters or {}).get('dtypes', None) or {}).get(dt_col, None) 275 if dt_col 276 else None 277 ) 278 279 def _dtype_matches(clean_d): 280 if not dt_col: 281 return value_is_null(clean_d) 282 return ( 283 (clean_d == 'int' and 'int' in str(dt_typ).lower()) 284 or 285 (clean_d == 'datetime' and 'int' not in str(dt_typ).lower()) 286 ) 287 288 in_match = not in_dtypes or any(_dtype_matches(d) for d in in_dtypes) 289 ex_match = bool(ex_dtypes and any(_dtype_matches(d) for d in ex_dtypes)) 290 keep_pipe = in_match and not ex_match 291 292 if not keep_pipe: 293 continue 294 295 if ck not in pipes: 296 pipes[ck] = {} 297 298 if mk not in pipes[ck]: 299 pipes[ck][mk] = {} 300 301 302 pipes[ck][mk][lk] = pipe 303 304 if not as_list and not as_tags_dict: 305 return pipes 306 307 from meerschaum.utils.pipes import flatten_pipes_dict 308 pipes_list = flatten_pipes_dict(pipes) 309 if as_list: 310 return pipes_list 311 312 pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1)) 313 def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]: 314 _tags = pipe.__dict__.get('_tags', None) 315 gathered_tags = _tags if _tags is not None else pipe.tags 316 return pipe, (gathered_tags or []) 317 318 tags_pipes = defaultdict(lambda: []) 319 pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list)) 320 for pipe, tags in pipes_tags.items(): 321 for tag in (tags or []): 322 tags_pipes[tag].append(pipe) 323 324 return dict(tags_pipes)
Return a dictionary or list of meerschaum.Pipe objects.
Parameters
- connector_keys (Union[str, List[str], None], default None):
String or list of connector keys.
If omitted or is
'*', fetch all possible keys. If a string begins with'_', select keys that do NOT match the string. - metric_keys (Union[str, List[str], None], default None):
String or list of metric keys. See
connector_keysfor formatting. - location_keys (Union[str, List[str], None], default None):
String or list of location keys. See
connector_keysfor formatting. - tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
- datetime_dtypes (Optional[List[str]], default None):
If provided, only include pipes with the corresponding
datetimeaxis dtypes. Accepted values aredatetime,int,None(ornull, etc.). May be negated by_. - params (Optional[Dict[str, Any]], default None):
Dictionary of additional parameters to search by.
Params are parsed into a SQL WHERE clause.
E.g.
{'a': 1, 'b': 2}equates to'WHERE a = 1 AND b = 2' - mrsm_instance (Union[str, InstanceConnector, None], default None):
Connector keys for the Meerschaum instance of the pipes.
Must be a
meerschaum.connectors.sql.SQLConnector.SQLConnectorormeerschaum.connectors.api.APIConnector.APIConnector. - as_list (bool, default False):
If
True, return pipes in a list instead of a hierarchical dictionary.False:{connector_keys: {metric_key: {location_key: Pipe}}}True:[Pipe] - as_tags_dict (bool, default False):
If
True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated. - method (str, default 'registered'):
Available options:
['registered', 'explicit', 'all']If'registered'(default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If'all', create pipes from predefined metrics and locations. Requiredconnector_keys. NOTE: Method'all'is not implemented! - workers (Optional[int], default None):
If provided (and
as_tags_dictisTrue), set the number of workers for the pool to fetch tags. Only takes effect if the instance connector supports multi-threading - **kw (Any:):
Keyword arguments to pass to the
meerschaum.Pipeconstructor.
Returns
- A dictionary of dictionaries and
meerschaum.Pipeobjects - in the connector, metric, location hierarchy.
- If
as_listisTrue, return a list ofmeerschaum.Pipeobjects. - If
as_tags_dictisTrue, return a dictionary mapping tags to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
... <connector_keys>: {
... <metric_key>: {
... <location_key>: Pipe(
... <connector_keys>,
... <metric_key>,
... <location_key>,
... ),
... },
... },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
68def get_connector( 69 type: str = None, 70 label: str = None, 71 refresh: bool = False, 72 debug: bool = False, 73 _load_plugins: bool = True, 74 **kw: Any 75) -> Connector: 76 """ 77 Return existing connector or create new connection and store for reuse. 78 79 You can create new connectors if enough parameters are provided for the given type and flavor. 80 81 Parameters 82 ---------- 83 type: Optional[str], default None 84 Connector type (sql, api, etc.). 85 Defaults to the type of the configured `instance_connector`. 86 87 label: Optional[str], default None 88 Connector label (e.g. main). Defaults to `'main'`. 89 90 refresh: bool, default False 91 Refresh the Connector instance / construct new object. Defaults to `False`. 92 93 kw: Any 94 Other arguments to pass to the Connector constructor. 95 If the Connector has already been constructed and new arguments are provided, 96 `refresh` is set to `True` and the old Connector is replaced. 97 98 Returns 99 ------- 100 A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`, 101 `meerschaum.connectors.sql.SQLConnector`). 102 103 Examples 104 -------- 105 The following parameters would create a new 106 `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file. 107 108 ``` 109 >>> conn = get_connector( 110 ... type = 'sql', 111 ... label = 'newlabel', 112 ... flavor = 'sqlite', 113 ... database = '/file/path/to/database.db' 114 ... ) 115 >>> 116 ``` 117 118 """ 119 from meerschaum.connectors.parse import parse_instance_keys 120 from meerschaum.config import get_config 121 from meerschaum._internal.static import STATIC_CONFIG 122 from meerschaum.utils.warnings import warn 123 global _loaded_plugin_connectors 124 if isinstance(type, str) and not label and ':' in type: 125 type, label = type.split(':', maxsplit=1) 126 127 if _load_plugins: 128 with _locks['_loaded_plugin_connectors']: 129 if not _loaded_plugin_connectors: 130 load_plugin_connectors() 131 _load_builtin_custom_connectors() 132 _loaded_plugin_connectors = True 133 134 if type is None and label is None: 135 default_instance_keys = get_config('meerschaum', 'instance', patch=True) 136 ### recursive call to get_connector 137 return parse_instance_keys(default_instance_keys) 138 139 ### NOTE: the default instance connector may not be main. 140 ### Only fall back to 'main' if the type is provided by the label is omitted. 141 label = label if label is not None else STATIC_CONFIG['connectors']['default_label'] 142 143 ### type might actually be a label. Check if so and raise a warning. 144 if type not in connectors: 145 possibilities, poss_msg = [], "" 146 for _type in get_config('meerschaum', 'connectors'): 147 if type in get_config('meerschaum', 'connectors', _type): 148 possibilities.append(f"{_type}:{type}") 149 if len(possibilities) > 0: 150 poss_msg = " Did you mean" 151 for poss in possibilities[:-1]: 152 poss_msg += f" '{poss}'," 153 if poss_msg.endswith(','): 154 poss_msg = poss_msg[:-1] 155 if len(possibilities) > 1: 156 poss_msg += " or" 157 poss_msg += f" '{possibilities[-1]}'?" 158 159 warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False) 160 return None 161 162 if 'sql' not in types: 163 from meerschaum.connectors.plugin import PluginConnector 164 from meerschaum.connectors.valkey import ValkeyConnector 165 with _locks['types']: 166 types.update({ 167 'api': APIConnector, 168 'sql': SQLConnector, 169 'plugin': PluginConnector, 170 'valkey': ValkeyConnector, 171 }) 172 173 ### determine if we need to call the constructor 174 if not refresh: 175 ### see if any user-supplied arguments differ from the existing instance 176 if label in connectors[type]: 177 warning_message = None 178 for attribute, value in kw.items(): 179 if attribute not in connectors[type][label].meta: 180 import inspect 181 cls = connectors[type][label].__class__ 182 cls_init_signature = inspect.signature(cls) 183 cls_init_params = cls_init_signature.parameters 184 if attribute not in cls_init_params: 185 warning_message = ( 186 f"Received new attribute '{attribute}' not present in connector " + 187 f"{connectors[type][label]}.\n" 188 ) 189 elif connectors[type][label].__dict__[attribute] != value: 190 warning_message = ( 191 f"Mismatched values for attribute '{attribute}' in connector " 192 + f"'{connectors[type][label]}'.\n" + 193 f" - Keyword value: '{value}'\n" + 194 f" - Existing value: '{connectors[type][label].__dict__[attribute]}'\n" 195 ) 196 if warning_message is not None: 197 warning_message += ( 198 "\nSetting `refresh` to True and recreating connector with type:" 199 + f" '{type}' and label '{label}'." 200 ) 201 refresh = True 202 warn(warning_message) 203 else: ### connector doesn't yet exist 204 refresh = True 205 206 ### only create an object if refresh is True 207 ### (can be manually specified, otherwise determined above) 208 if refresh: 209 with _locks['connectors']: 210 try: 211 ### will raise an error if configuration is incorrect / missing 212 conn = types[type](label=label, **kw) 213 connectors[type][label] = conn 214 except InvalidAttributesError as ie: 215 warn( 216 f"Incorrect attributes for connector '{type}:{label}'.\n" 217 + str(ie), 218 stack = False, 219 ) 220 conn = None 221 except Exception as e: 222 from meerschaum.utils.formatting import get_console 223 console = get_console() 224 if console: 225 console.print_exception() 226 warn( 227 f"Exception when creating connector '{type}:{label}'.\n" + str(e), 228 stack = False, 229 ) 230 conn = None 231 if conn is None: 232 return None 233 234 return connectors[type][label]
Return existing connector or create new connection and store for reuse.
You can create new connectors if enough parameters are provided for the given type and flavor.
Parameters
- type (Optional[str], default None):
Connector type (sql, api, etc.).
Defaults to the type of the configured
instance_connector. - label (Optional[str], default None):
Connector label (e.g. main). Defaults to
'main'. - refresh (bool, default False):
Refresh the Connector instance / construct new object. Defaults to
False. - kw (Any):
Other arguments to pass to the Connector constructor.
If the Connector has already been constructed and new arguments are provided,
refreshis set toTrueand the old Connector is replaced.
Returns
- A new Meerschaum connector (e.g.
meerschaum.connectors.api.APIConnector, meerschaum.connectors.sql.SQLConnector).
Examples
The following parameters would create a new
meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.
>>> conn = get_connector(
... type = 'sql',
... label = 'newlabel',
... flavor = 'sqlite',
... database = '/file/path/to/database.db'
... )
>>>
112def get_config( 113 *keys: str, 114 patch: bool = True, 115 substitute: bool = True, 116 sync_files: bool = True, 117 write_missing: bool = True, 118 as_tuple: bool = False, 119 warn: bool = True, 120 debug: bool = False 121) -> Any: 122 """ 123 Return the Meerschaum configuration dictionary. 124 If positional arguments are provided, index by the keys. 125 Raises a warning if invalid keys are provided. 126 127 Parameters 128 ---------- 129 keys: str: 130 List of strings to index. 131 132 patch: bool, default True 133 If `True`, patch missing default keys into the config directory. 134 Defaults to `True`. 135 136 sync_files: bool, default True 137 If `True`, sync files if needed. 138 Defaults to `True`. 139 140 write_missing: bool, default True 141 If `True`, write default values when the main config files are missing. 142 Defaults to `True`. 143 144 substitute: bool, default True 145 If `True`, subsitute 'MRSM{}' values. 146 Defaults to `True`. 147 148 as_tuple: bool, default False 149 If `True`, return a tuple of type (success, value). 150 Defaults to `False`. 151 152 Returns 153 ------- 154 The value in the configuration directory, indexed by the provided keys. 155 156 Examples 157 -------- 158 >>> get_config('meerschaum', 'instance') 159 'sql:main' 160 >>> get_config('does', 'not', 'exist') 161 UserWarning: Invalid keys in config: ('does', 'not', 'exist') 162 """ 163 import json 164 165 symlinks_key = STATIC_CONFIG['config']['symlinks_key'] 166 if debug: 167 from meerschaum.utils.debug import dprint 168 dprint(f"Indexing keys: {keys}", color=False) 169 170 if len(keys) == 0: 171 _rc = _config( 172 substitute=substitute, 173 sync_files=sync_files, 174 write_missing=(write_missing and _allow_write_missing), 175 ) 176 if as_tuple: 177 return True, _rc 178 return _rc 179 180 ### Weird threading issues, only import if substitute is True. 181 if substitute: 182 from meerschaum.config._read_config import search_and_substitute_config 183 ### Invalidate the cache if it was read before with substitute=False 184 ### but there still exist substitutions. 185 if ( 186 config is not None and substitute and keys[0] != symlinks_key 187 and 'MRSM{' in json.dumps(config.get(keys[0])) 188 ): 189 try: 190 _subbed = search_and_substitute_config({keys[0]: config[keys[0]]}) 191 except Exception: 192 import traceback 193 traceback.print_exc() 194 _subbed = {keys[0]: config[keys[0]]} 195 196 config[keys[0]] = _subbed[keys[0]] 197 if symlinks_key in _subbed: 198 if symlinks_key not in config: 199 config[symlinks_key] = {} 200 config[symlinks_key] = apply_patch_to_config( 201 _subbed.get(symlinks_key, {}), 202 config.get(symlinks_key, {}), 203 ) 204 205 from meerschaum.config._sync import sync_files as _sync_files 206 if config is None: 207 _config(*keys, sync_files=sync_files) 208 209 invalid_keys = False 210 if keys[0] not in config and keys[0] != symlinks_key: 211 single_key_config = read_config( 212 keys=[keys[0]], substitute=substitute, write_missing=write_missing 213 ) 214 if keys[0] not in single_key_config: 215 invalid_keys = True 216 else: 217 config[keys[0]] = single_key_config.get(keys[0], None) 218 if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]: 219 if symlinks_key not in config: 220 config[symlinks_key] = {} 221 config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]] 222 223 if sync_files: 224 _sync_files(keys=[keys[0]]) 225 226 c = config 227 if len(keys) > 0: 228 for k in keys: 229 try: 230 c = c[k] 231 except Exception: 232 invalid_keys = True 233 break 234 if invalid_keys: 235 ### Check if the keys are in the default configuration. 236 from meerschaum.config._default import default_config 237 in_default = True 238 patched_default_config = ( 239 search_and_substitute_config(default_config) 240 if substitute else copy.deepcopy(default_config) 241 ) 242 _c = patched_default_config 243 for k in keys: 244 try: 245 _c = _c[k] 246 except Exception: 247 in_default = False 248 if in_default: 249 c = _c 250 invalid_keys = False 251 warning_msg = f"Invalid keys in config: {keys}" 252 if not in_default: 253 try: 254 if warn: 255 from meerschaum.utils.warnings import warn as _warn 256 _warn(warning_msg, stacklevel=3, color=False) 257 except Exception: 258 if warn: 259 print(warning_msg) 260 if as_tuple: 261 return False, None 262 return None 263 264 ### Don't write keys that we haven't yet loaded into memory. 265 not_loaded_keys = [k for k in patched_default_config if k not in config] 266 for k in not_loaded_keys: 267 patched_default_config.pop(k, None) 268 269 set_config( 270 apply_patch_to_config( 271 patched_default_config, 272 config, 273 ) 274 ) 275 if patch and keys[0] != symlinks_key: 276 if write_missing: 277 write_config(config, debug=debug) 278 279 if as_tuple: 280 return (not invalid_keys), c 281 return c
Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.
Parameters
- keys (str:): List of strings to index.
- patch (bool, default True):
If
True, patch missing default keys into the config directory. Defaults toTrue. - sync_files (bool, default True):
If
True, sync files if needed. Defaults toTrue. - write_missing (bool, default True):
If
True, write default values when the main config files are missing. Defaults toTrue. - substitute (bool, default True):
If
True, subsitute 'MRSM{}' values. Defaults toTrue. - as_tuple (bool, default False):
If
True, return a tuple of type (success, value). Defaults toFalse.
Returns
- The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
66class Pipe: 67 """ 68 Access Meerschaum pipes via Pipe objects. 69 70 Pipes are identified by the following: 71 72 1. Connector keys (e.g. `'sql:main'`) 73 2. Metric key (e.g. `'weather'`) 74 3. Location (optional; e.g. `None`) 75 76 A pipe's connector keys correspond to a data source, and when the pipe is synced, 77 its `fetch` definition is evaluated and executed to produce new data. 78 79 Alternatively, new data may be directly synced via `pipe.sync()`: 80 81 ``` 82 >>> from meerschaum import Pipe 83 >>> pipe = Pipe('csv', 'weather') 84 >>> 85 >>> import pandas as pd 86 >>> df = pd.read_csv('weather.csv') 87 >>> pipe.sync(df) 88 ``` 89 """ 90 91 from ._fetch import ( 92 fetch, 93 get_backtrack_interval, 94 ) 95 from ._data import ( 96 get_data, 97 get_backtrack_data, 98 get_rowcount, 99 get_data, 100 get_doc, 101 get_docs, 102 get_value, 103 _get_data_as_iterator, 104 get_chunk_interval, 105 get_chunk_bounds, 106 get_chunk_bounds_batches, 107 parse_date_bounds, 108 ) 109 from ._register import register 110 from ._attributes import ( 111 attributes, 112 parameters, 113 columns, 114 indices, 115 indexes, 116 dtypes, 117 autoincrement, 118 autotime, 119 upsert, 120 static, 121 tzinfo, 122 enforce, 123 null_indices, 124 mixed_numerics, 125 get_columns, 126 get_columns_types, 127 get_columns_indices, 128 get_indices, 129 get_parameters, 130 get_dtypes, 131 update_parameters, 132 tags, 133 get_id, 134 id, 135 get_val_column, 136 parents, 137 parent, 138 children, 139 child, 140 reference, 141 references, 142 target, 143 _target_legacy, 144 guess_datetime, 145 precision, 146 get_precision, 147 ) 148 from ._cache import ( 149 _get_cache_connector, 150 _cache_value, 151 _get_cached_value, 152 _invalidate_cache, 153 _get_cache_dir_path, 154 _write_cache_key, 155 _write_cache_file, 156 _write_cache_conn_key, 157 _read_cache_key, 158 _read_cache_file, 159 _read_cache_conn_key, 160 _load_cache_keys, 161 _load_cache_files, 162 _load_cache_conn_keys, 163 _get_cache_keys, 164 _get_cache_file_keys, 165 _get_cache_conn_keys, 166 _clear_cache_key, 167 _clear_cache_file, 168 _clear_cache_conn_key, 169 ) 170 from ._show import show 171 from ._edit import edit, edit_definition, update 172 from ._sync import ( 173 sync, 174 get_sync_time, 175 exists, 176 filter_existing, 177 _get_chunk_label, 178 get_num_workers, 179 _persist_new_special_columns, 180 ) 181 from ._verify import ( 182 verify, 183 get_bound_interval, 184 get_bound_time, 185 ) 186 from ._delete import delete 187 from ._drop import drop, drop_indices 188 from ._index import create_indices 189 from ._clear import clear 190 from ._deduplicate import deduplicate 191 from ._bootstrap import bootstrap 192 from ._dtypes import enforce_dtypes, infer_dtypes 193 from ._copy import copy_to 194 195 def __init__( 196 self, 197 connector: str = '', 198 metric: str = '', 199 location: Optional[str] = None, 200 parameters: Optional[Dict[str, Any]] = None, 201 columns: Union[Dict[str, str], List[str], None] = None, 202 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 203 tags: Optional[List[str]] = None, 204 target: Optional[str] = None, 205 dtypes: Optional[Dict[str, str]] = None, 206 instance: Optional[Union[str, InstanceConnector]] = None, 207 upsert: Optional[bool] = None, 208 autoincrement: Optional[bool] = None, 209 autotime: Optional[bool] = None, 210 precision: Union[str, Dict[str, Union[str, int]], None] = None, 211 static: Optional[bool] = None, 212 enforce: Optional[bool] = None, 213 null_indices: Optional[bool] = None, 214 mixed_numerics: Optional[bool] = None, 215 temporary: bool = False, 216 cache: Optional[bool] = None, 217 cache_connector_keys: Optional[str] = None, 218 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 219 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 220 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 221 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 222 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 223 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 224 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 225 connector_keys: Optional[str] = None, 226 metric_key: Optional[str] = None, 227 location_key: Optional[str] = None, 228 instance_keys: Optional[str] = None, 229 indexes: Union[Dict[str, str], List[str], None] = None, 230 debug: bool = False, 231 ): 232 """ 233 Parameters 234 ---------- 235 connector: str 236 Keys for the pipe's source connector, e.g. `'sql:main'`. 237 238 metric: str 239 Label for the pipe's contents, e.g. `'weather'`. 240 241 location: str, default None 242 Label for the pipe's location. Defaults to `None`. 243 244 parameters: Optional[Dict[str, Any]], default None 245 Optionally set a pipe's parameters from the constructor, 246 e.g. columns and other attributes. 247 You can edit these parameters with `edit pipes`. 248 249 columns: Union[Dict[str, str], List[str], None], default None 250 Set the `columns` dictionary of `parameters`. 251 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 252 253 indices: Optional[Dict[str, Union[str, List[str]]]], default None 254 Set the `indices` dictionary of `parameters`. 255 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 256 257 tags: Optional[List[str]], default None 258 A list of strings to be added under the `'tags'` key of `parameters`. 259 You can select pipes with certain tags using `--tags`. 260 261 dtypes: Optional[Dict[str, str]], default None 262 Set the `dtypes` dictionary of `parameters`. 263 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 264 265 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 266 Connector for the Meerschaum instance where the pipe resides. 267 Defaults to the preconfigured default instance (`'sql:main'`). 268 269 instance: Optional[Union[str, InstanceConnector]], default None 270 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 271 272 upsert: Optional[bool], default None 273 If `True`, set `upsert` to `True` in the parameters. 274 275 autoincrement: Optional[bool], default None 276 If `True`, set `autoincrement` in the parameters. 277 278 autotime: Optional[bool], default None 279 If `True`, set `autotime` in the parameters. 280 281 precision: Union[str, Dict[str, Union[str, int]], None], default None 282 If provided, set `precision` in the parameters. 283 This may be either a string (the precision unit) or a dictionary of in the form 284 `{'unit': <unit>, 'interval': <interval>}`. 285 Default is determined by the `datetime` column dtype 286 (e.g. `datetime64[us]` is `microsecond` precision). 287 288 static: Optional[bool], default None 289 If `True`, set `static` in the parameters. 290 291 enforce: Optional[bool], default None 292 If `False`, skip data type enforcement. 293 Default behavior is `True`. 294 295 null_indices: Optional[bool], default None 296 Set to `False` if there will be no null values in the index columns. 297 Defaults to `True`. 298 299 mixed_numerics: bool, default None 300 If `True`, integer columns will be converted to `numeric` when floats are synced. 301 Set to `False` to disable this behavior. 302 Defaults to `True`. 303 304 temporary: bool, default False 305 If `True`, prevent instance tables (pipes, users, plugins) from being created. 306 307 cache: Optional[bool], default None 308 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 309 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 310 Defaults to `True` (from `None`). 311 312 cache_connector_keys: Optional[str], default None 313 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 314 315 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 316 If provided, inherit the parameters of the reference Pipe(s). 317 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 318 a Pipe itself, or a list of any of these values. 319 320 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 321 Set references for parent pipes. See `references` for values. 322 323 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 324 Set references for child pipes. See `references` for values. 325 326 """ 327 from meerschaum.utils.warnings import error, warn 328 if (not connector and not connector_keys) or (not metric and not metric_key): 329 error( 330 "Please provide strings for the connector and metric\n " 331 + "(first two positional arguments)." 332 ) 333 334 ### Fall back to legacy `location_key` just in case. 335 if not location: 336 location = location_key 337 338 if not connector: 339 connector = connector_keys 340 341 if not metric: 342 metric = metric_key 343 344 if location in ('[None]', 'None'): 345 location = None 346 347 from meerschaum._internal.static import STATIC_CONFIG 348 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 349 for k in (connector, metric, location, *(tags or [])): 350 if str(k).startswith(negation_prefix): 351 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 352 353 self._connector_keys = str(connector) 354 self._connector_key = self.connector_keys ### Alias 355 self._metric_key = metric 356 self._location_key = location 357 self.temporary = temporary 358 self.cache = ( 359 cache 360 if cache is not None 361 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 362 ) 363 self.cache_connector_keys = ( 364 str(cache_connector_keys) 365 if cache_connector_keys is not None 366 else None 367 ) 368 self.debug = debug 369 370 self._attributes: Dict[str, Any] = { 371 'connector_keys': self._connector_keys, 372 'metric_key': self._metric_key, 373 'location_key': self._location_key, 374 'parameters': {}, 375 } 376 377 ### only set parameters if values are provided 378 if isinstance(parameters, dict): 379 self._attributes['parameters'] = parameters 380 else: 381 if parameters is not None: 382 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 383 self._attributes['parameters'] = {} 384 385 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 386 if isinstance(columns, (list, tuple)): 387 columns = {str(col): str(col) for col in columns} 388 if isinstance(columns, dict): 389 self._attributes['parameters']['columns'] = columns 390 elif isinstance(columns, str) and 'Pipe(' in columns: 391 pass 392 elif columns is not None: 393 warn(f"The provided columns are of invalid type '{type(columns)}'.") 394 395 indices = ( 396 indices 397 or indexes 398 or self._attributes.get('parameters', {}).get('indices', None) 399 or self._attributes.get('parameters', {}).get('indexes', None) 400 ) 401 if isinstance(indices, dict): 402 indices_key = ( 403 'indexes' 404 if 'indexes' in self._attributes['parameters'] 405 else 'indices' 406 ) 407 self._attributes['parameters'][indices_key] = indices 408 409 if isinstance(tags, (list, tuple)): 410 self._attributes['parameters']['tags'] = tags 411 elif tags is not None: 412 warn(f"The provided tags are of invalid type '{type(tags)}'.") 413 414 if isinstance(target, str): 415 self._attributes['parameters']['target'] = target 416 elif target is not None: 417 warn(f"The provided target is of invalid type '{type(target)}'.") 418 419 if isinstance(dtypes, dict): 420 self._attributes['parameters']['dtypes'] = dtypes 421 elif dtypes is not None: 422 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 423 424 if isinstance(upsert, bool): 425 self._attributes['parameters']['upsert'] = upsert 426 427 if isinstance(autoincrement, bool): 428 self._attributes['parameters']['autoincrement'] = autoincrement 429 430 if isinstance(autotime, bool): 431 self._attributes['parameters']['autotime'] = autotime 432 433 if isinstance(precision, dict): 434 self._attributes['parameters']['precision'] = precision 435 elif isinstance(precision, str): 436 self._attributes['parameters']['precision'] = {'unit': precision} 437 438 if isinstance(static, bool): 439 self._attributes['parameters']['static'] = static 440 self._static = static 441 442 if isinstance(enforce, bool): 443 self._attributes['parameters']['enforce'] = enforce 444 445 if isinstance(null_indices, bool): 446 self._attributes['parameters']['null_indices'] = null_indices 447 448 if isinstance(mixed_numerics, bool): 449 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 450 451 ### NOTE: The parameters dictionary is {} by default. 452 ### A Pipe may be registered without parameters, then edited, 453 ### or a Pipe may be registered with parameters set in-memory first. 454 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 455 if _mrsm_instance is None: 456 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 457 458 if not isinstance(_mrsm_instance, str): 459 self._instance_connector = _mrsm_instance 460 self._instance_keys = str(_mrsm_instance) 461 else: 462 self._instance_keys = _mrsm_instance 463 464 if self._instance_keys == 'sql:memory': 465 self.cache = False 466 467 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 468 469 if references is not None or reference is not None: 470 reference_vals = references if references is not None else reference 471 self.references = reference_vals 472 473 if parents is not None or parent is not None: 474 parent_vals = parents if parents is not None else parent 475 self.parents = parent_vals 476 477 if children is not None or child is not None: 478 children_vals = children if children is not None else child 479 self.children = children_vals 480 481 @property 482 def metric_key(self) -> str: 483 """ 484 Return the pipe's metric key. 485 """ 486 return self._metric_key 487 488 @property 489 def metric(self) -> str: 490 """ 491 Return the pipe's metric key. 492 """ 493 return self._metric_key 494 495 @property 496 def location_key(self) -> Union[str, None]: 497 """ 498 Return the pipe's location key. 499 """ 500 return self._location_key 501 502 @property 503 def location(self) -> Union[str, None]: 504 """ 505 Return the pipe's location key. 506 """ 507 return self._location_key 508 509 @property 510 def meta(self): 511 """ 512 Return the four keys needed to reconstruct this pipe. 513 """ 514 return { 515 'connector_keys': self.connector_keys, 516 'metric_key': self.metric_key, 517 'location_key': self.location_key, 518 'instance_keys': self.instance_keys, 519 } 520 521 def keys(self) -> List[str]: 522 """ 523 Return the ordered keys for this pipe. 524 """ 525 return { 526 key: val 527 for key, val in self.meta.items() 528 if key != 'instance' 529 } 530 531 @property 532 def instance_keys(self) -> str: 533 """ 534 Return the pipe's instance keys. 535 """ 536 return self._instance_keys 537 538 @property 539 def instance(self) -> Union[InstanceConnector, str]: 540 """ 541 Return the pipe's instance connector or keys. 542 """ 543 conn = self.instance_connector 544 if conn is None: 545 return self.instance_keys 546 return conn 547 548 @property 549 def instance_connector(self) -> Union[InstanceConnector, None]: 550 """ 551 The instance connector on which this pipe resides. 552 """ 553 if '_instance_connector' not in self.__dict__: 554 from meerschaum.connectors.parse import parse_instance_keys 555 conn = parse_instance_keys(self.instance_keys) 556 if conn: 557 self._instance_connector = conn 558 else: 559 return None 560 return self._instance_connector 561 562 @property 563 def connector_keys(self) -> str: 564 """ 565 Return the pipe's connector keys. 566 """ 567 return self._connector_keys 568 569 @property 570 def connector_key(self) -> str: 571 """ 572 Legacy: use `Pipe.connector_keys` instead. 573 """ 574 return self.connector_keys 575 576 @property 577 def connector(self) -> Union['Connector', str]: 578 """ 579 The connector to the data source. 580 """ 581 if '_connector' not in self.__dict__: 582 from meerschaum.connectors.parse import parse_instance_keys 583 import warnings 584 with warnings.catch_warnings(): 585 warnings.simplefilter('ignore') 586 try: 587 conn = parse_instance_keys(self.connector_keys) 588 except Exception: 589 conn = None 590 if conn: 591 self._connector = conn 592 else: 593 return self._connector_keys 594 return self._connector 595 596 def __str__(self, ansi: bool=False): 597 return pipe_repr(self, ansi=ansi) 598 599 def __eq__(self, other): 600 try: 601 return ( 602 isinstance(self, type(other)) 603 and self.connector_keys == other.connector_keys 604 and self.metric_key == other.metric_key 605 and self.location_key == other.location_key 606 and self.instance_keys == other.instance_keys 607 ) 608 except Exception: 609 return False 610 611 def __hash__(self): 612 ### Using an esoteric separator to avoid collisions. 613 sep = "[\"']" 614 return hash( 615 str(self.connector_keys) + sep 616 + str(self.metric_key) + sep 617 + str(self.location_key) + sep 618 + str(self.instance_keys) + sep 619 ) 620 621 def __repr__(self, ansi: bool=True, **kw) -> str: 622 if not hasattr(sys, 'ps1'): 623 ansi = False 624 625 return pipe_repr(self, ansi=ansi, **kw) 626 627 def __pt_repr__(self): 628 from meerschaum.utils.packages import attempt_import 629 prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False) 630 return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True)) 631 632 def __getstate__(self) -> Dict[str, Any]: 633 """ 634 Define the state dictionary (pickling). 635 """ 636 return { 637 'connector_keys': self.connector_keys, 638 'metric_key': self.metric_key, 639 'location_key': self.location_key, 640 'parameters': self._attributes.get('parameters', None), 641 'instance_keys': self.instance_keys, 642 } 643 644 def __setstate__(self, _state: Dict[str, Any]): 645 """ 646 Read the state (unpickling). 647 """ 648 self.__init__(**_state) 649 650 def __getitem__(self, key: str) -> Any: 651 """ 652 Index the pipe's attributes. 653 If the `key` cannot be found`, return `None`. 654 """ 655 if key in self.attributes: 656 return self.attributes.get(key, None) 657 658 aliases = { 659 'connector': 'connector_keys', 660 'connector_key': 'connector_keys', 661 'metric': 'metric_key', 662 'location': 'location_key', 663 } 664 aliased_key = aliases.get(key, None) 665 if aliased_key is not None: 666 return self.attributes.get(aliased_key, None) 667 668 property_aliases = { 669 'instance': 'instance_keys', 670 'instance_key': 'instance_keys', 671 } 672 aliased_key = property_aliases.get(key, None) 673 if aliased_key is not None: 674 key = aliased_key 675 return getattr(self, key, None) 676 677 def __copy__(self): 678 """ 679 Return a shallow copy of the current pipe. 680 """ 681 return mrsm.Pipe( 682 self.connector_keys, self.metric_key, self.location_key, 683 instance=self.instance_keys, 684 parameters=self._attributes.get('parameters', None), 685 ) 686 687 def __deepcopy__(self, memo): 688 """ 689 Return a deep copy of the current pipe. 690 """ 691 return self.__copy__()
Access Meerschaum pipes via Pipe objects.
Pipes are identified by the following:
- Connector keys (e.g.
'sql:main') - Metric key (e.g.
'weather') - Location (optional; e.g.
None)
A pipe's connector keys correspond to a data source, and when the pipe is synced,
its fetch definition is evaluated and executed to produce new data.
Alternatively, new data may be directly synced via pipe.sync():
>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
195 def __init__( 196 self, 197 connector: str = '', 198 metric: str = '', 199 location: Optional[str] = None, 200 parameters: Optional[Dict[str, Any]] = None, 201 columns: Union[Dict[str, str], List[str], None] = None, 202 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 203 tags: Optional[List[str]] = None, 204 target: Optional[str] = None, 205 dtypes: Optional[Dict[str, str]] = None, 206 instance: Optional[Union[str, InstanceConnector]] = None, 207 upsert: Optional[bool] = None, 208 autoincrement: Optional[bool] = None, 209 autotime: Optional[bool] = None, 210 precision: Union[str, Dict[str, Union[str, int]], None] = None, 211 static: Optional[bool] = None, 212 enforce: Optional[bool] = None, 213 null_indices: Optional[bool] = None, 214 mixed_numerics: Optional[bool] = None, 215 temporary: bool = False, 216 cache: Optional[bool] = None, 217 cache_connector_keys: Optional[str] = None, 218 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 219 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 220 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 221 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 222 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 223 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 224 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 225 connector_keys: Optional[str] = None, 226 metric_key: Optional[str] = None, 227 location_key: Optional[str] = None, 228 instance_keys: Optional[str] = None, 229 indexes: Union[Dict[str, str], List[str], None] = None, 230 debug: bool = False, 231 ): 232 """ 233 Parameters 234 ---------- 235 connector: str 236 Keys for the pipe's source connector, e.g. `'sql:main'`. 237 238 metric: str 239 Label for the pipe's contents, e.g. `'weather'`. 240 241 location: str, default None 242 Label for the pipe's location. Defaults to `None`. 243 244 parameters: Optional[Dict[str, Any]], default None 245 Optionally set a pipe's parameters from the constructor, 246 e.g. columns and other attributes. 247 You can edit these parameters with `edit pipes`. 248 249 columns: Union[Dict[str, str], List[str], None], default None 250 Set the `columns` dictionary of `parameters`. 251 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 252 253 indices: Optional[Dict[str, Union[str, List[str]]]], default None 254 Set the `indices` dictionary of `parameters`. 255 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 256 257 tags: Optional[List[str]], default None 258 A list of strings to be added under the `'tags'` key of `parameters`. 259 You can select pipes with certain tags using `--tags`. 260 261 dtypes: Optional[Dict[str, str]], default None 262 Set the `dtypes` dictionary of `parameters`. 263 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 264 265 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 266 Connector for the Meerschaum instance where the pipe resides. 267 Defaults to the preconfigured default instance (`'sql:main'`). 268 269 instance: Optional[Union[str, InstanceConnector]], default None 270 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 271 272 upsert: Optional[bool], default None 273 If `True`, set `upsert` to `True` in the parameters. 274 275 autoincrement: Optional[bool], default None 276 If `True`, set `autoincrement` in the parameters. 277 278 autotime: Optional[bool], default None 279 If `True`, set `autotime` in the parameters. 280 281 precision: Union[str, Dict[str, Union[str, int]], None], default None 282 If provided, set `precision` in the parameters. 283 This may be either a string (the precision unit) or a dictionary of in the form 284 `{'unit': <unit>, 'interval': <interval>}`. 285 Default is determined by the `datetime` column dtype 286 (e.g. `datetime64[us]` is `microsecond` precision). 287 288 static: Optional[bool], default None 289 If `True`, set `static` in the parameters. 290 291 enforce: Optional[bool], default None 292 If `False`, skip data type enforcement. 293 Default behavior is `True`. 294 295 null_indices: Optional[bool], default None 296 Set to `False` if there will be no null values in the index columns. 297 Defaults to `True`. 298 299 mixed_numerics: bool, default None 300 If `True`, integer columns will be converted to `numeric` when floats are synced. 301 Set to `False` to disable this behavior. 302 Defaults to `True`. 303 304 temporary: bool, default False 305 If `True`, prevent instance tables (pipes, users, plugins) from being created. 306 307 cache: Optional[bool], default None 308 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 309 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 310 Defaults to `True` (from `None`). 311 312 cache_connector_keys: Optional[str], default None 313 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 314 315 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 316 If provided, inherit the parameters of the reference Pipe(s). 317 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 318 a Pipe itself, or a list of any of these values. 319 320 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 321 Set references for parent pipes. See `references` for values. 322 323 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 324 Set references for child pipes. See `references` for values. 325 326 """ 327 from meerschaum.utils.warnings import error, warn 328 if (not connector and not connector_keys) or (not metric and not metric_key): 329 error( 330 "Please provide strings for the connector and metric\n " 331 + "(first two positional arguments)." 332 ) 333 334 ### Fall back to legacy `location_key` just in case. 335 if not location: 336 location = location_key 337 338 if not connector: 339 connector = connector_keys 340 341 if not metric: 342 metric = metric_key 343 344 if location in ('[None]', 'None'): 345 location = None 346 347 from meerschaum._internal.static import STATIC_CONFIG 348 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 349 for k in (connector, metric, location, *(tags or [])): 350 if str(k).startswith(negation_prefix): 351 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 352 353 self._connector_keys = str(connector) 354 self._connector_key = self.connector_keys ### Alias 355 self._metric_key = metric 356 self._location_key = location 357 self.temporary = temporary 358 self.cache = ( 359 cache 360 if cache is not None 361 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 362 ) 363 self.cache_connector_keys = ( 364 str(cache_connector_keys) 365 if cache_connector_keys is not None 366 else None 367 ) 368 self.debug = debug 369 370 self._attributes: Dict[str, Any] = { 371 'connector_keys': self._connector_keys, 372 'metric_key': self._metric_key, 373 'location_key': self._location_key, 374 'parameters': {}, 375 } 376 377 ### only set parameters if values are provided 378 if isinstance(parameters, dict): 379 self._attributes['parameters'] = parameters 380 else: 381 if parameters is not None: 382 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 383 self._attributes['parameters'] = {} 384 385 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 386 if isinstance(columns, (list, tuple)): 387 columns = {str(col): str(col) for col in columns} 388 if isinstance(columns, dict): 389 self._attributes['parameters']['columns'] = columns 390 elif isinstance(columns, str) and 'Pipe(' in columns: 391 pass 392 elif columns is not None: 393 warn(f"The provided columns are of invalid type '{type(columns)}'.") 394 395 indices = ( 396 indices 397 or indexes 398 or self._attributes.get('parameters', {}).get('indices', None) 399 or self._attributes.get('parameters', {}).get('indexes', None) 400 ) 401 if isinstance(indices, dict): 402 indices_key = ( 403 'indexes' 404 if 'indexes' in self._attributes['parameters'] 405 else 'indices' 406 ) 407 self._attributes['parameters'][indices_key] = indices 408 409 if isinstance(tags, (list, tuple)): 410 self._attributes['parameters']['tags'] = tags 411 elif tags is not None: 412 warn(f"The provided tags are of invalid type '{type(tags)}'.") 413 414 if isinstance(target, str): 415 self._attributes['parameters']['target'] = target 416 elif target is not None: 417 warn(f"The provided target is of invalid type '{type(target)}'.") 418 419 if isinstance(dtypes, dict): 420 self._attributes['parameters']['dtypes'] = dtypes 421 elif dtypes is not None: 422 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 423 424 if isinstance(upsert, bool): 425 self._attributes['parameters']['upsert'] = upsert 426 427 if isinstance(autoincrement, bool): 428 self._attributes['parameters']['autoincrement'] = autoincrement 429 430 if isinstance(autotime, bool): 431 self._attributes['parameters']['autotime'] = autotime 432 433 if isinstance(precision, dict): 434 self._attributes['parameters']['precision'] = precision 435 elif isinstance(precision, str): 436 self._attributes['parameters']['precision'] = {'unit': precision} 437 438 if isinstance(static, bool): 439 self._attributes['parameters']['static'] = static 440 self._static = static 441 442 if isinstance(enforce, bool): 443 self._attributes['parameters']['enforce'] = enforce 444 445 if isinstance(null_indices, bool): 446 self._attributes['parameters']['null_indices'] = null_indices 447 448 if isinstance(mixed_numerics, bool): 449 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 450 451 ### NOTE: The parameters dictionary is {} by default. 452 ### A Pipe may be registered without parameters, then edited, 453 ### or a Pipe may be registered with parameters set in-memory first. 454 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 455 if _mrsm_instance is None: 456 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 457 458 if not isinstance(_mrsm_instance, str): 459 self._instance_connector = _mrsm_instance 460 self._instance_keys = str(_mrsm_instance) 461 else: 462 self._instance_keys = _mrsm_instance 463 464 if self._instance_keys == 'sql:memory': 465 self.cache = False 466 467 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 468 469 if references is not None or reference is not None: 470 reference_vals = references if references is not None else reference 471 self.references = reference_vals 472 473 if parents is not None or parent is not None: 474 parent_vals = parents if parents is not None else parent 475 self.parents = parent_vals 476 477 if children is not None or child is not None: 478 children_vals = children if children is not None else child 479 self.children = children_vals
Parameters
- connector (str):
Keys for the pipe's source connector, e.g.
'sql:main'. - metric (str):
Label for the pipe's contents, e.g.
'weather'. - location (str, default None):
Label for the pipe's location. Defaults to
None. - parameters (Optional[Dict[str, Any]], default None):
Optionally set a pipe's parameters from the constructor,
e.g. columns and other attributes.
You can edit these parameters with
edit pipes. - columns (Union[Dict[str, str], List[str], None], default None):
Set the
columnsdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'columns'key. - indices (Optional[Dict[str, Union[str, List[str]]]], default None):
Set the
indicesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'indices'key. - tags (Optional[List[str]], default None):
A list of strings to be added under the
'tags'key ofparameters. You can select pipes with certain tags using--tags. - dtypes (Optional[Dict[str, str]], default None):
Set the
dtypesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'dtypes'key. - mrsm_instance (Optional[Union[str, InstanceConnector]], default None):
Connector for the Meerschaum instance where the pipe resides.
Defaults to the preconfigured default instance (
'sql:main'). - instance (Optional[Union[str, InstanceConnector]], default None):
Alias for
mrsm_instance. Ifmrsm_instanceis supplied, this value is ignored. - upsert (Optional[bool], default None):
If
True, setupserttoTruein the parameters. - autoincrement (Optional[bool], default None):
If
True, setautoincrementin the parameters. - autotime (Optional[bool], default None):
If
True, setautotimein the parameters. - precision (Union[str, Dict[str, Union[str, int]], None], default None):
If provided, set
precisionin the parameters. This may be either a string (the precision unit) or a dictionary of in the form{'unit': <unit>, 'interval': <interval>}. Default is determined by thedatetimecolumn dtype (e.g.datetime64[us]ismicrosecondprecision). - static (Optional[bool], default None):
If
True, setstaticin the parameters. - enforce (Optional[bool], default None):
If
False, skip data type enforcement. Default behavior isTrue. - null_indices (Optional[bool], default None):
Set to
Falseif there will be no null values in the index columns. Defaults toTrue. - mixed_numerics (bool, default None):
If
True, integer columns will be converted tonumericwhen floats are synced. Set toFalseto disable this behavior. Defaults toTrue. - temporary (bool, default False):
If
True, prevent instance tables (pipes, users, plugins) from being created. - cache (Optional[bool], default None):
If
True, cache the pipe's metadata to disk (in addition to in-memory caching). Ifcacheis not explicitlyTrue, it is set toFalseiftemporaryisTrue. Defaults toTrue(fromNone). - cache_connector_keys (Optional[str], default None):
If provided, use the keys to a Valkey connector (e.g.
valkey:main). - references (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): If provided, inherit the parameters of the reference Pipe(s). May be equal to a string of the Pipe constructor, a dictionary of constructor keys, a Pipe itself, or a list of any of these values.
- parents (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for parent pipes. See
referencesfor values. - children (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for child pipes. See
referencesfor values.
481 @property 482 def metric_key(self) -> str: 483 """ 484 Return the pipe's metric key. 485 """ 486 return self._metric_key
Return the pipe's metric key.
488 @property 489 def metric(self) -> str: 490 """ 491 Return the pipe's metric key. 492 """ 493 return self._metric_key
Return the pipe's metric key.
495 @property 496 def location_key(self) -> Union[str, None]: 497 """ 498 Return the pipe's location key. 499 """ 500 return self._location_key
Return the pipe's location key.
502 @property 503 def location(self) -> Union[str, None]: 504 """ 505 Return the pipe's location key. 506 """ 507 return self._location_key
Return the pipe's location key.
509 @property 510 def meta(self): 511 """ 512 Return the four keys needed to reconstruct this pipe. 513 """ 514 return { 515 'connector_keys': self.connector_keys, 516 'metric_key': self.metric_key, 517 'location_key': self.location_key, 518 'instance_keys': self.instance_keys, 519 }
Return the four keys needed to reconstruct this pipe.
521 def keys(self) -> List[str]: 522 """ 523 Return the ordered keys for this pipe. 524 """ 525 return { 526 key: val 527 for key, val in self.meta.items() 528 if key != 'instance' 529 }
Return the ordered keys for this pipe.
531 @property 532 def instance_keys(self) -> str: 533 """ 534 Return the pipe's instance keys. 535 """ 536 return self._instance_keys
Return the pipe's instance keys.
538 @property 539 def instance(self) -> Union[InstanceConnector, str]: 540 """ 541 Return the pipe's instance connector or keys. 542 """ 543 conn = self.instance_connector 544 if conn is None: 545 return self.instance_keys 546 return conn
Return the pipe's instance connector or keys.
548 @property 549 def instance_connector(self) -> Union[InstanceConnector, None]: 550 """ 551 The instance connector on which this pipe resides. 552 """ 553 if '_instance_connector' not in self.__dict__: 554 from meerschaum.connectors.parse import parse_instance_keys 555 conn = parse_instance_keys(self.instance_keys) 556 if conn: 557 self._instance_connector = conn 558 else: 559 return None 560 return self._instance_connector
The instance connector on which this pipe resides.
562 @property 563 def connector_keys(self) -> str: 564 """ 565 Return the pipe's connector keys. 566 """ 567 return self._connector_keys
Return the pipe's connector keys.
569 @property 570 def connector_key(self) -> str: 571 """ 572 Legacy: use `Pipe.connector_keys` instead. 573 """ 574 return self.connector_keys
Legacy: use Pipe.connector_keys instead.
576 @property 577 def connector(self) -> Union['Connector', str]: 578 """ 579 The connector to the data source. 580 """ 581 if '_connector' not in self.__dict__: 582 from meerschaum.connectors.parse import parse_instance_keys 583 import warnings 584 with warnings.catch_warnings(): 585 warnings.simplefilter('ignore') 586 try: 587 conn = parse_instance_keys(self.connector_keys) 588 except Exception: 589 conn = None 590 if conn: 591 self._connector = conn 592 else: 593 return self._connector_keys 594 return self._connector
The connector to the data source.
21def fetch( 22 self, 23 begin: Union[datetime, int, str, None] = '', 24 end: Union[datetime, int, None] = None, 25 check_existing: bool = True, 26 sync_chunks: bool = False, 27 debug: bool = False, 28 **kw: Any 29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 30 """ 31 Fetch a Pipe's latest data from its connector. 32 33 Parameters 34 ---------- 35 begin: Union[datetime, str, None], default '': 36 If provided, only fetch data newer than or equal to `begin`. 37 38 end: Optional[datetime], default None: 39 If provided, only fetch data older than or equal to `end`. 40 41 check_existing: bool, default True 42 If `False`, do not apply the backtrack interval. 43 44 sync_chunks: bool, default False 45 If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching 46 loads chunks into memory. 47 48 debug: bool, default False 49 Verbosity toggle. 50 51 Returns 52 ------- 53 A `pd.DataFrame` of the newest unseen data. 54 55 """ 56 if 'fetch' not in dir(self.connector): 57 warn(f"No `fetch()` function defined for connector '{self.connector}'") 58 return None 59 60 from meerschaum.connectors import get_connector_plugin 61 from meerschaum.utils.misc import filter_arguments 62 63 _chunk_hook = kw.pop('chunk_hook', None) 64 kw['workers'] = self.get_num_workers(kw.get('workers', None)) 65 if sync_chunks and _chunk_hook is None: 66 67 def _chunk_hook(chunk, **_kw) -> SuccessTuple: 68 """ 69 Wrap `Pipe.sync()` with a custom chunk label prepended to the message. 70 """ 71 from meerschaum.config._patch import apply_patch_to_config 72 kwargs = apply_patch_to_config(kw, _kw) 73 chunk_success, chunk_message = self.sync(chunk, **kwargs) 74 chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None)) 75 if chunk_label: 76 chunk_message = '\n' + chunk_label + '\n' + chunk_message 77 return chunk_success, chunk_message 78 79 begin, end = self.parse_date_bounds(begin, end) 80 81 with mrsm.Venv(get_connector_plugin(self.connector)): 82 _args, _kwargs = filter_arguments( 83 self.connector.fetch, 84 self, 85 begin=_determine_begin( 86 self, 87 begin, 88 end, 89 check_existing=check_existing, 90 debug=debug, 91 ), 92 end=end, 93 chunk_hook=_chunk_hook, 94 debug=debug, 95 **kw 96 ) 97 df = self.connector.fetch(*_args, **_kwargs) 98 return df
Fetch a Pipe's latest data from its connector.
Parameters
- begin (Union[datetime, str, None], default '':):
If provided, only fetch data newer than or equal to
begin. - end (Optional[datetime], default None:):
If provided, only fetch data older than or equal to
end. - check_existing (bool, default True):
If
False, do not apply the backtrack interval. - sync_chunks (bool, default False):
If
Trueand the pipe's connector is of type'sql', begin syncing chunks while fetching loads chunks into memory. - debug (bool, default False): Verbosity toggle.
Returns
- A
pd.DataFrameof the newest unseen data.
101def get_backtrack_interval( 102 self, 103 check_existing: bool = True, 104 debug: bool = False, 105) -> Union[timedelta, int]: 106 """ 107 Get the chunk interval to use for this pipe. 108 109 Parameters 110 ---------- 111 check_existing: bool, default True 112 If `False`, return a backtrack_interval of 0 minutes. 113 114 Returns 115 ------- 116 The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 117 """ 118 from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES 119 default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes') 120 configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None) 121 backtrack_minutes = ( 122 configured_backtrack_minutes 123 if configured_backtrack_minutes is not None 124 else default_backtrack_minutes 125 ) if check_existing else 0 126 127 dt_col = self.columns.get('datetime', None) 128 if dt_col is None: 129 return timedelta(minutes=backtrack_minutes) 130 131 dt_dtype = self.dtypes.get(dt_col, 'datetime') 132 if 'int' in dt_dtype.lower(): 133 if not self.parameters.get('precision', None): 134 return backtrack_minutes 135 precision_unit = self.precision.get('unit', None) 136 true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 137 scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None) 138 if scalar is not None: 139 return int(backtrack_minutes * 60 * scalar) 140 return backtrack_minutes 141 142 return timedelta(minutes=backtrack_minutes)
Get the chunk interval to use for this pipe.
Parameters
- check_existing (bool, default True):
If
False, return a backtrack_interval of 0 minutes.
Returns
- The backtrack interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
23def get_data( 24 self, 25 select_columns: Optional[List[str]] = None, 26 omit_columns: Optional[List[str]] = None, 27 begin: Union[datetime, int, str, None] = None, 28 end: Union[datetime, int, str, None] = None, 29 params: Optional[Dict[str, Any]] = None, 30 as_docs: bool = False, 31 as_iterator: bool = False, 32 as_chunks: bool = False, 33 as_dask: bool = False, 34 add_missing_columns: bool = False, 35 chunk_interval: Union[timedelta, int, None] = None, 36 order: Optional[str] = 'asc', 37 limit: Optional[int] = None, 38 fresh: bool = False, 39 debug: bool = False, 40 **kw: Any 41) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 42 """ 43 Get a pipe's data from the instance connector. 44 45 Parameters 46 ---------- 47 select_columns: Optional[List[str]], default None 48 If provided, only select these given columns. 49 Otherwise select all available columns (i.e. `SELECT *`). 50 51 omit_columns: Optional[List[str]], default None 52 If provided, remove these columns from the selection. 53 54 begin: Union[datetime, int, str, None], default None 55 Lower bound datetime to begin searching for data (inclusive). 56 Translates to a `WHERE` clause like `WHERE datetime >= begin`. 57 Defaults to `None`. 58 59 end: Union[datetime, int, str, None], default None 60 Upper bound datetime to stop searching for data (inclusive). 61 Translates to a `WHERE` clause like `WHERE datetime < end`. 62 Defaults to `None`. 63 64 params: Optional[Dict[str, Any]], default None 65 Filter the retrieved data by a dictionary of parameters. 66 See `meerschaum.utils.sql.build_where` for more details. 67 68 as_docs: bool, default False 69 If `True`, return a list of dictionaries rather than a DataFrame. 70 Relies on `get_pipe_docs` from the instance connector if implemented. 71 May be combined with `as_chunks` to return an `Iterator[List[Dict]]` 72 chunked by time bounds (useful for large result sets without pandas overhead). 73 74 as_iterator: bool, default False 75 If `True`, return a generator of chunks of pipe data. 76 When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames. 77 78 as_chunks: bool, default False 79 Alias for `as_iterator`. 80 When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames. 81 82 as_dask: bool, default False 83 If `True`, return a `dask.DataFrame` 84 (which may be loaded into a Pandas DataFrame with `df.compute()`). 85 86 add_missing_columns: bool, default False 87 If `True`, add any missing columns from `Pipe.dtypes` to the dataframe. 88 89 chunk_interval: Union[timedelta, int, None], default None 90 If `as_iterator`, then return chunks with `begin` and `end` separated by this interval. 91 This may be set under `pipe.parameters['chunk_minutes']`. 92 By default, use a timedelta of 1440 minutes (1 day). 93 If `chunk_interval` is an integer and the `datetime` axis a timestamp, 94 the use a timedelta with the number of minutes configured to this value. 95 If the `datetime` axis is an integer, default to the configured chunksize. 96 If `chunk_interval` is a `timedelta` and the `datetime` axis an integer, 97 use the number of minutes in the `timedelta`. 98 99 order: Optional[str], default 'asc' 100 If `order` is not `None`, sort the resulting dataframe by indices. 101 102 limit: Optional[int], default None 103 If provided, cap the dataframe to this many rows. 104 105 fresh: bool, default False 106 If `True`, skip local cache and directly query the instance connector. 107 108 debug: bool, default False 109 Verbosity toggle. 110 Defaults to `False`. 111 112 Returns 113 ------- 114 A `pd.DataFrame` of the pipe's data (default). 115 A `List[Dict]` if `as_docs=True`. 116 An `Iterator[pd.DataFrame]` if `as_chunks=True` (or `as_iterator=True`). 117 An `Iterator[List[Dict]]` if both `as_docs=True` and `as_chunks=True`. 118 119 """ 120 from meerschaum.utils.warnings import warn 121 from meerschaum.utils.venv import Venv 122 from meerschaum.connectors import get_connector_plugin 123 from meerschaum.utils.dtypes import to_pandas_dtype 124 from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator 125 from meerschaum.utils.packages import attempt_import 126 from meerschaum.utils.warnings import dprint 127 dd = attempt_import('dask.dataframe') if as_dask else None 128 dask = attempt_import('dask') if as_dask else None 129 _ = attempt_import('partd', lazy=False) if as_dask else None 130 131 if select_columns == '*': 132 select_columns = None 133 elif isinstance(select_columns, str): 134 select_columns = [select_columns] 135 136 if isinstance(omit_columns, str): 137 omit_columns = [omit_columns] 138 139 begin, end = self.parse_date_bounds(begin, end, debug=debug) 140 as_iterator = as_iterator or as_chunks 141 dt_col = self.columns.get('datetime', None) 142 143 def _sort_df(_df): 144 if df_is_chunk_generator(_df): 145 return _df 146 indices = [] if dt_col not in _df.columns else [dt_col] 147 non_dt_cols = [ 148 col 149 for col_ix, col in self.columns.items() 150 if col_ix != 'datetime' and col in _df.columns 151 ] 152 indices.extend(non_dt_cols) 153 if 'dask' not in _df.__module__: 154 _df.sort_values( 155 by=indices, 156 inplace=True, 157 ascending=(str(order).lower() == 'asc'), 158 ) 159 _df.reset_index(drop=True, inplace=True) 160 else: 161 _df = _df.sort_values( 162 by=indices, 163 ascending=(str(order).lower() == 'asc'), 164 ) 165 _df = _df.reset_index(drop=True) 166 if limit is not None and len(_df) > limit: 167 return _df.head(limit) 168 return _df 169 170 if as_iterator or as_chunks: 171 df = self._get_data_as_iterator( 172 select_columns=select_columns, 173 omit_columns=omit_columns, 174 begin=begin, 175 end=end, 176 params=params, 177 chunk_interval=chunk_interval, 178 limit=limit, 179 order=order, 180 as_docs=as_docs, 181 fresh=fresh, 182 debug=debug, 183 ) 184 if as_docs: 185 return df 186 return _sort_df(df) 187 188 if as_dask: 189 from multiprocessing.pool import ThreadPool 190 dask_pool = ThreadPool(self.get_num_workers()) 191 dask.config.set(pool=dask_pool) 192 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 193 bounds = self.get_chunk_bounds( 194 begin=begin, 195 end=end, 196 bounded=False, 197 chunk_interval=chunk_interval, 198 debug=debug, 199 ) 200 dask_chunks = [ 201 dask.delayed(self.get_data)( 202 select_columns=select_columns, 203 omit_columns=omit_columns, 204 begin=chunk_begin, 205 end=chunk_end, 206 params=params, 207 chunk_interval=chunk_interval, 208 order=order, 209 limit=limit, 210 fresh=fresh, 211 add_missing_columns=True, 212 debug=debug, 213 ) 214 for (chunk_begin, chunk_end) in bounds 215 ] 216 dask_meta = { 217 col: to_pandas_dtype(typ) 218 for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items() 219 } 220 if debug: 221 dprint(f"Dask meta:\n{dask_meta}") 222 return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta)) 223 224 if not self.exists(debug=debug): 225 return [] if as_docs else None 226 227 if as_docs: 228 with Venv(get_connector_plugin(self.instance_connector)): 229 docs = self.instance_connector.get_pipe_docs( 230 pipe=self, 231 select_columns=select_columns, 232 omit_columns=omit_columns, 233 begin=begin, 234 end=end, 235 params=params, 236 limit=limit, 237 order=order, 238 debug=debug, 239 **kw 240 ) 241 return docs if docs is not None else [] 242 243 with Venv(get_connector_plugin(self.instance_connector)): 244 df = self.instance_connector.get_pipe_data( 245 pipe=self, 246 select_columns=select_columns, 247 omit_columns=omit_columns, 248 begin=begin, 249 end=end, 250 params=params, 251 limit=limit, 252 order=order, 253 debug=debug, 254 **kw 255 ) 256 if df is None: 257 return df 258 259 if not select_columns: 260 select_columns = [col for col in df.columns] 261 262 pipe_dtypes = self.get_dtypes(refresh=False, debug=debug) 263 cols_to_omit = [ 264 col 265 for col in df.columns 266 if ( 267 col in (omit_columns or []) 268 or 269 col not in (select_columns or []) 270 ) 271 ] 272 cols_to_add = [ 273 col 274 for col in select_columns 275 if col not in df.columns 276 ] + ([ 277 col 278 for col in pipe_dtypes 279 if col not in df.columns 280 ] if add_missing_columns else []) 281 if cols_to_omit: 282 warn( 283 ( 284 f"Received {len(cols_to_omit)} omitted column" 285 + ('s' if len(cols_to_omit) != 1 else '') 286 + f" for {self}. " 287 + "Consider adding `select_columns` and `omit_columns` support to " 288 + f"'{self.instance_connector.type}' connectors to improve performance." 289 ), 290 stack=False, 291 ) 292 _cols_to_select = [col for col in df.columns if col not in cols_to_omit] 293 df = df[_cols_to_select] 294 295 if cols_to_add: 296 if not add_missing_columns: 297 from meerschaum.utils.misc import items_str 298 warn( 299 f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.", 300 stack=False, 301 ) 302 303 df = add_missing_cols_to_df( 304 df, 305 { 306 col: pipe_dtypes.get(col, 'string') 307 for col in cols_to_add 308 }, 309 ) 310 311 enforced_df = self.enforce_dtypes( 312 df, 313 dtypes=pipe_dtypes, 314 debug=debug, 315 ) 316 317 if order: 318 return _sort_df(enforced_df) 319 return enforced_df
Get a pipe's data from the instance connector.
Parameters
- select_columns (Optional[List[str]], default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
- begin (Union[datetime, int, str, None], default None):
Lower bound datetime to begin searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime >= begin. Defaults toNone. - end (Union[datetime, int, str, None], default None):
Upper bound datetime to stop searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime < end. Defaults toNone. - params (Optional[Dict[str, Any]], default None):
Filter the retrieved data by a dictionary of parameters.
See
meerschaum.utils.sql.build_wherefor more details. - as_docs (bool, default False):
If
True, return a list of dictionaries rather than a DataFrame. Relies onget_pipe_docsfrom the instance connector if implemented. May be combined withas_chunksto return anIterator[List[Dict]]chunked by time bounds (useful for large result sets without pandas overhead). - as_iterator (bool, default False):
If
True, return a generator of chunks of pipe data. When combined withas_docs=True, yieldsList[Dict]per chunk instead of DataFrames. - as_chunks (bool, default False):
Alias for
as_iterator. When combined withas_docs=True, yieldsList[Dict]per chunk instead of DataFrames. - as_dask (bool, default False):
If
True, return adask.DataFrame(which may be loaded into a Pandas DataFrame withdf.compute()). - add_missing_columns (bool, default False):
If
True, add any missing columns fromPipe.dtypesto the dataframe. - chunk_interval (Union[timedelta, int, None], default None):
If
as_iterator, then return chunks withbeginandendseparated by this interval. This may be set underpipe.parameters['chunk_minutes']. By default, use a timedelta of 1440 minutes (1 day). Ifchunk_intervalis an integer and thedatetimeaxis a timestamp, the use a timedelta with the number of minutes configured to this value. If thedatetimeaxis is an integer, default to the configured chunksize. Ifchunk_intervalis atimedeltaand thedatetimeaxis an integer, use the number of minutes in thetimedelta. - order (Optional[str], default 'asc'):
If
orderis notNone, sort the resulting dataframe by indices. - limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
- fresh (bool, default False):
If
True, skip local cache and directly query the instance connector. - debug (bool, default False):
Verbosity toggle.
Defaults to
False.
Returns
- A
pd.DataFrameof the pipe's data (default). - A
List[Dict]ifas_docs=True. - An
Iterator[pd.DataFrame]ifas_chunks=True(oras_iterator=True). - An
Iterator[List[Dict]]if bothas_docs=Trueandas_chunks=True.
414def get_backtrack_data( 415 self, 416 backtrack_minutes: Optional[int] = None, 417 begin: Union[datetime, int, None] = None, 418 params: Optional[Dict[str, Any]] = None, 419 limit: Optional[int] = None, 420 fresh: bool = False, 421 debug: bool = False, 422 **kw: Any 423) -> Optional['pd.DataFrame']: 424 """ 425 Get the most recent data from the instance connector as a Pandas DataFrame. 426 427 Parameters 428 ---------- 429 backtrack_minutes: Optional[int], default None 430 How many minutes from `begin` to select from. 431 If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`. 432 433 begin: Optional[datetime], default None 434 The starting point to search for data. 435 If begin is `None` (default), use the most recent observed datetime 436 (AKA sync_time). 437 438 ``` 439 E.g. begin = 02:00 440 441 Search this region. Ignore this, even if there's data. 442 / / / / / / / / / | 443 -----|----------|----------|----------|----------|----------| 444 00:00 01:00 02:00 03:00 04:00 05:00 445 446 ``` 447 448 params: Optional[Dict[str, Any]], default None 449 The standard Meerschaum `params` query dictionary. 450 451 limit: Optional[int], default None 452 If provided, cap the number of rows to be returned. 453 454 fresh: bool, default False 455 If `True`, Ignore local cache and pull directly from the instance connector. 456 Only comes into effect if a pipe was created with `cache=True`. 457 458 debug: bool default False 459 Verbosity toggle. 460 461 Returns 462 ------- 463 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data 464 is a convenient way to get a pipe's data "backtracked" from the most recent datetime. 465 """ 466 from meerschaum.utils.venv import Venv 467 from meerschaum.connectors import get_connector_plugin 468 469 if not self.exists(debug=debug): 470 return None 471 472 begin = self.parse_date_bounds(begin, debug=debug) 473 474 backtrack_interval = self.get_backtrack_interval(debug=debug) 475 if backtrack_minutes is None: 476 backtrack_minutes = ( 477 (backtrack_interval.total_seconds() / 60) 478 if isinstance(backtrack_interval, timedelta) 479 else backtrack_interval 480 ) 481 482 if hasattr(self.instance_connector, 'get_backtrack_data'): 483 with Venv(get_connector_plugin(self.instance_connector)): 484 return self.enforce_dtypes( 485 self.instance_connector.get_backtrack_data( 486 pipe=self, 487 begin=begin, 488 backtrack_minutes=backtrack_minutes, 489 params=params, 490 limit=limit, 491 debug=debug, 492 **kw 493 ), 494 debug=debug, 495 ) 496 497 if begin is None: 498 begin = self.get_sync_time(params=params, debug=debug) 499 500 backtrack_interval = ( 501 timedelta(minutes=backtrack_minutes) 502 if isinstance(begin, datetime) 503 else backtrack_minutes 504 ) 505 if begin is not None: 506 begin = begin - backtrack_interval 507 508 kw['order'] = kw.get('order', 'desc') or 'desc' 509 return self.get_data( 510 begin=begin, 511 params=params, 512 debug=debug, 513 limit=limit, 514 **kw 515 )
Get the most recent data from the instance connector as a Pandas DataFrame.
Parameters
- backtrack_minutes (Optional[int], default None):
How many minutes from
beginto select from. IfNone, usepipe.parameters['fetch']['backtrack_minutes']. begin (Optional[datetime], default None): The starting point to search for data. If begin is
None(default), use the most recent observed datetime (AKA sync_time).E.g. begin = 02:00 Search this region. Ignore this, even if there's data. / / / / / / / / / | -----|----------|----------|----------|----------|----------| 00:00 01:00 02:00 03:00 04:00 05:00params (Optional[Dict[str, Any]], default None): The standard Meerschaum
paramsquery dictionary.- limit (Optional[int], default None): If provided, cap the number of rows to be returned.
- fresh (bool, default False):
If
True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created withcache=True. - debug (bool default False): Verbosity toggle.
Returns
- A
pd.DataFramefor the pipe's data corresponding to the provided parameters. Backtrack data - is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
518def get_rowcount( 519 self, 520 begin: Union[datetime, int, None] = None, 521 end: Union[datetime, int, None] = None, 522 params: Optional[Dict[str, Any]] = None, 523 remote: bool = False, 524 debug: bool = False 525) -> int: 526 """ 527 Get a Pipe's instance or remote rowcount. 528 529 Parameters 530 ---------- 531 begin: Optional[datetime], default None 532 Count rows where datetime > begin. 533 534 end: Optional[datetime], default None 535 Count rows where datetime < end. 536 537 remote: bool, default False 538 Count rows from a pipe's remote source. 539 **NOTE**: This is experimental! 540 541 debug: bool, default False 542 Verbosity toggle. 543 544 Returns 545 ------- 546 An `int` of the number of rows in the pipe corresponding to the provided parameters. 547 Returned 0 if the pipe does not exist. 548 """ 549 from meerschaum.utils.warnings import warn 550 from meerschaum.utils.venv import Venv 551 from meerschaum.connectors import get_connector_plugin 552 from meerschaum.utils.misc import filter_keywords 553 554 begin, end = self.parse_date_bounds(begin, end, debug=debug) 555 connector = self.instance_connector if not remote else self.connector 556 try: 557 with Venv(get_connector_plugin(connector)): 558 if not hasattr(connector, 'get_pipe_rowcount'): 559 warn( 560 f"Connectors of type '{connector.type}' " 561 "do not implement `get_pipe_rowcount()`.", 562 stack=False, 563 ) 564 return 0 565 kwargs = filter_keywords( 566 connector.get_pipe_rowcount, 567 begin=begin, 568 end=end, 569 params=params, 570 remote=remote, 571 debug=debug, 572 ) 573 if remote and 'remote' not in kwargs: 574 warn( 575 f"Connectors of type '{connector.type}' do not support remote rowcounts.", 576 stack=False, 577 ) 578 return 0 579 rowcount = connector.get_pipe_rowcount( 580 self, 581 begin=begin, 582 end=end, 583 params=params, 584 remote=remote, 585 debug=debug, 586 ) 587 if rowcount is None: 588 return 0 589 return rowcount 590 except AttributeError as e: 591 warn(e) 592 if remote: 593 return 0 594 warn(f"Failed to get a rowcount for {self}.") 595 return 0
Get a Pipe's instance or remote rowcount.
Parameters
- begin (Optional[datetime], default None): Count rows where datetime > begin.
- end (Optional[datetime], default None): Count rows where datetime < end.
- remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
- debug (bool, default False): Verbosity toggle.
Returns
- An
intof the number of rows in the pipe corresponding to the provided parameters. - Returned 0 if the pipe does not exist.
879def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]: 880 """ 881 Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data(). 882 Keywords arguments are passed to `Pipe.get_data()`. 883 """ 884 from meerschaum.utils.warnings import warn 885 kwargs['limit'] = 1 886 kwargs['as_docs'] = True 887 try: 888 docs = self.get_data(**kwargs) 889 if not docs: 890 return None 891 return docs[0] 892 except Exception as e: 893 warn(f"Failed to read value from {self}:\n{e}", stack=False) 894 return None
Convenience function to return a single row as a dictionary (or None) from Pipe.get_data().
Keywords arguments are passed toPipe.get_data()`.
896def get_docs(self, **kwargs) -> list[dict[str, Any]]: 897 """ 898 Convenience method to return a pipe's data as a list of dictionaries. 899 Relies on `get_pipe_docs` from the instance connector if implemented. 900 """ 901 kwargs['as_docs'] = True 902 return self.get_data(**kwargs)
Convenience method to return a pipe's data as a list of dictionaries.
Relies on get_pipe_docs from the instance connector if implemented.
904def get_value( 905 self, 906 column: str, 907 params: Optional[Dict[str, Any]] = None, 908 **kwargs: Any 909) -> Any: 910 """ 911 Convenience function to return a single value (or `None`) from `Pipe.get_data()`. 912 Keywords arguments are passed to `Pipe.get_data()`. 913 """ 914 from meerschaum.utils.warnings import warn 915 kwargs['select_columns'] = [column] 916 kwargs['limit'] = 1 917 kwargs['as_docs'] = True 918 try: 919 docs = self.get_data(params=params, **kwargs) 920 if not docs: 921 return None 922 if column not in docs[0]: 923 raise ValueError(f"Column '{column}' was not included in the result set.") 924 return docs[0][column] 925 except Exception as e: 926 warn(f"Failed to read value from {self}:\n{e}", stack=False) 927 return None
Convenience function to return a single value (or None) from Pipe.get_data().
Keywords arguments are passed to Pipe.get_data().
598def get_chunk_interval( 599 self, 600 chunk_interval: Union[timedelta, int, None] = None, 601 debug: bool = False, 602) -> Union[timedelta, int]: 603 """ 604 Get the chunk interval to use for this pipe. 605 606 Parameters 607 ---------- 608 chunk_interval: Union[timedelta, int, None], default None 609 If provided, coerce this value into the correct type. 610 For example, if the datetime axis is an integer, then 611 return the number of minutes. 612 613 Returns 614 ------- 615 The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 616 """ 617 from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES 618 default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes') 619 configured_chunk_minutes = self.parameters.get('verify', {}).get('chunk_minutes', None) 620 chunk_minutes = ( 621 (configured_chunk_minutes or default_chunk_minutes) 622 if chunk_interval is None 623 else ( 624 chunk_interval 625 if isinstance(chunk_interval, int) 626 else int(chunk_interval.total_seconds() / 60) 627 ) 628 ) 629 630 dt_col = self.columns.get('datetime', None) 631 if dt_col is None: 632 return timedelta(minutes=chunk_minutes) 633 634 dt_dtype = self.dtypes.get(dt_col, 'datetime') 635 if 'int' in dt_dtype.lower(): 636 if chunk_interval is not None or not self.parameters.get('precision', None): 637 return chunk_minutes 638 precision_unit = self.precision.get('unit', None) 639 true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 640 scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None) 641 if scalar is not None: 642 return int(chunk_minutes * 60 * scalar) 643 return chunk_minutes 644 645 return timedelta(minutes=chunk_minutes)
Get the chunk interval to use for this pipe.
Parameters
- chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type. For example, if the datetime axis is an integer, then return the number of minutes.
Returns
- The chunk interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
648def get_chunk_bounds( 649 self, 650 begin: Union[datetime, int, None] = None, 651 end: Union[datetime, int, None] = None, 652 bounded: bool = False, 653 chunk_interval: Union[timedelta, int, None] = None, 654 debug: bool = False, 655) -> List[ 656 Tuple[ 657 Union[datetime, int, None], 658 Union[datetime, int, None], 659 ] 660]: 661 """ 662 Return a list of datetime bounds for iterating over the pipe's `datetime` axis. 663 664 Parameters 665 ---------- 666 begin: Union[datetime, int, None], default None 667 If provided, do not select less than this value. 668 Otherwise the first chunk will be unbounded. 669 670 end: Union[datetime, int, None], default None 671 If provided, do not select greater than or equal to this value. 672 Otherwise the last chunk will be unbounded. 673 674 bounded: bool, default False 675 If `True`, do not include `None` in the first chunk. 676 677 chunk_interval: Union[timedelta, int, None], default None 678 If provided, use this interval for the size of chunk boundaries. 679 The default value for this pipe may be set 680 under `pipe.parameters['verify']['chunk_minutes']`. 681 682 debug: bool, default False 683 Verbosity toggle. 684 685 Returns 686 ------- 687 A list of chunk bounds (datetimes or integers). 688 If unbounded, the first and last chunks will include `None`. 689 """ 690 from datetime import timedelta 691 from meerschaum.utils.dtypes import are_dtypes_equal 692 from meerschaum.utils.misc import interval_str 693 include_less_than_begin = not bounded and begin is None 694 include_greater_than_end = not bounded and end is None 695 if begin is None: 696 begin = self.get_sync_time(newest=False, debug=debug) 697 consolidate_end_chunk = False 698 if end is None: 699 end = self.get_sync_time(newest=True, debug=debug) 700 if end is not None and hasattr(end, 'tzinfo'): 701 end += timedelta(minutes=1) 702 consolidate_end_chunk = True 703 elif are_dtypes_equal(str(type(end)), 'int'): 704 end += 1 705 consolidate_end_chunk = True 706 707 if begin is None and end is None: 708 return [(None, None)] 709 710 begin, end = self.parse_date_bounds(begin, end, debug=debug) 711 712 if begin and end: 713 if begin >= end: 714 return ( 715 [(begin, begin)] 716 if bounded 717 else [(begin, None)] 718 ) 719 if end <= begin: 720 return ( 721 [(end, end)] 722 if bounded 723 else [(None, begin)] 724 ) 725 726 ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`. 727 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 728 729 ### Build a list of tuples containing the chunk boundaries 730 ### so that we can sync multiple chunks in parallel. 731 ### Run `verify pipes --workers 1` to sync chunks in series. 732 chunk_bounds = [] 733 begin_cursor = begin 734 num_chunks = 0 735 max_chunks = 1_000_000 736 while begin_cursor < end: 737 end_cursor = begin_cursor + chunk_interval 738 chunk_bounds.append((begin_cursor, end_cursor)) 739 begin_cursor = end_cursor 740 num_chunks += 1 741 if num_chunks >= max_chunks: 742 raise ValueError( 743 f"Too many chunks of size '{interval_str(chunk_interval)}' " 744 f"between '{begin}' and '{end}'." 745 ) 746 747 if num_chunks > 1 and consolidate_end_chunk: 748 last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2] 749 chunk_bounds = chunk_bounds[:-2] 750 chunk_bounds.append((second_last_bounds[0], last_bounds[1])) 751 752 ### The chunk interval might be too large. 753 if not chunk_bounds and end >= begin: 754 chunk_bounds = [(begin, end)] 755 756 ### Truncate the last chunk to the end timestamp. 757 if chunk_bounds[-1][1] > end: 758 chunk_bounds[-1] = (chunk_bounds[-1][0], end) 759 760 ### Pop the last chunk if its bounds are equal. 761 if chunk_bounds[-1][0] == chunk_bounds[-1][1]: 762 chunk_bounds = chunk_bounds[:-1] 763 764 if include_less_than_begin: 765 chunk_bounds = [(None, begin)] + chunk_bounds 766 if include_greater_than_end: 767 chunk_bounds = chunk_bounds + [(end, None)] 768 769 return chunk_bounds
Return a list of datetime bounds for iterating over the pipe's datetime axis.
Parameters
- begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
- end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
- bounded (bool, default False):
If
True, do not includeNonein the first chunk. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this interval for the size of chunk boundaries.
The default value for this pipe may be set
under
pipe.parameters['verify']['chunk_minutes']. - debug (bool, default False): Verbosity toggle.
Returns
- A list of chunk bounds (datetimes or integers).
- If unbounded, the first and last chunks will include
None.
772def get_chunk_bounds_batches( 773 self, 774 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]], 775 batchsize: Optional[int] = None, 776 workers: Optional[int] = None, 777 debug: bool = False, 778) -> List[ 779 Tuple[ 780 Tuple[ 781 Union[datetime, int, None], 782 Union[datetime, int, None], 783 ], ... 784 ] 785]: 786 """ 787 Return a list of tuples of chunk bounds of size `batchsize`. 788 789 Parameters 790 ---------- 791 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]] 792 A list of chunk_bounds (see `Pipe.get_chunk_bounds()`). 793 794 batchsize: Optional[int], default None 795 How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`. 796 797 workers: Optional[int], default None 798 If `batchsize` is `None`, use this as the desired number of workers. 799 Passed to `Pipe.get_num_workers()`. 800 801 Returns 802 ------- 803 A list of tuples of chunk bound tuples. 804 """ 805 from meerschaum.utils.misc import iterate_chunks 806 807 if batchsize is None: 808 batchsize = self.get_num_workers(workers=workers) 809 810 return [ 811 tuple( 812 _batch_chunk_bounds 813 for _batch_chunk_bounds in batch 814 if _batch_chunk_bounds is not None 815 ) 816 for batch in iterate_chunks(chunk_bounds, batchsize) 817 if batch 818 ]
Return a list of tuples of chunk bounds of size batchsize.
Parameters
- chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]):
A list of chunk_bounds (see
Pipe.get_chunk_bounds()). - batchsize (Optional[int], default None):
How many chunks to include in a batch. Defaults to
Pipe.get_num_workers(). - workers (Optional[int], default None):
If
batchsizeisNone, use this as the desired number of workers. Passed toPipe.get_num_workers().
Returns
- A list of tuples of chunk bound tuples.
821def parse_date_bounds(self, *dt_vals: Union[datetime, int, None], debug: bool = False) -> Union[ 822 datetime, 823 int, 824 str, 825 None, 826 Tuple[Union[datetime, int, str, None]] 827]: 828 """ 829 Given a date bound (begin, end), coerce a timezone if necessary. 830 """ 831 from meerschaum.utils.misc import is_int 832 from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES 833 from meerschaum.utils.warnings import warn 834 dateutil_parser = mrsm.attempt_import('dateutil.parser') 835 836 _columns = None 837 _dtypes = None 838 839 def _get_coercion_info(): 840 nonlocal _columns, _dtypes 841 if _columns is None: 842 _columns = self.get_parameters(debug=debug).get('columns', {}) or {} 843 if _dtypes is None: 844 _dtypes = self.get_dtypes(debug=debug) 845 846 def _parse_date_bound(dt_val): 847 if dt_val is None: 848 return None 849 850 if isinstance(dt_val, int): 851 return dt_val 852 853 if dt_val == '': 854 return '' 855 856 if is_int(dt_val): 857 return int(dt_val) 858 859 if isinstance(dt_val, str): 860 try: 861 dt_val = dateutil_parser.parse(dt_val) 862 except Exception as e: 863 warn(f"Could not parse '{dt_val}' as datetime:\n{e}") 864 return None 865 866 _get_coercion_info() 867 dt_col = _columns.get('datetime', None) 868 dt_typ = str(_dtypes.get(dt_col, 'datetime')) 869 if dt_typ == 'datetime': 870 dt_typ = MRSM_PD_DTYPES['datetime'] 871 return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower())) 872 873 bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals) 874 if len(bounds) == 1: 875 return bounds[0] 876 return bounds
Given a date bound (begin, end), coerce a timezone if necessary.
12def register( 13 self, 14 debug: bool = False, 15 **kw: Any 16) -> SuccessTuple: 17 """ 18 Register a new Pipe along with its attributes. 19 20 Parameters 21 ---------- 22 debug: bool, default False 23 Verbosity toggle. 24 25 kw: Any 26 Keyword arguments to pass to `instance_connector.register_pipe()`. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 """ 32 if self.temporary: 33 return False, "Cannot register pipes created with `temporary=True` (read-only)." 34 35 from meerschaum.utils.formatting import get_console 36 from meerschaum.utils.venv import Venv 37 from meerschaum.connectors import get_connector_plugin, custom_types 38 from meerschaum.config._patch import apply_patch_to_config 39 40 import warnings 41 with warnings.catch_warnings(): 42 warnings.simplefilter('ignore') 43 try: 44 _conn = self.connector 45 except Exception: 46 _conn = None 47 48 if isinstance(_conn, str): 49 _conn = None 50 51 if ( 52 _conn is not None 53 and 54 (_conn.type == 'plugin' or _conn.type in custom_types) 55 and 56 getattr(_conn, 'register', None) is not None 57 ): 58 try: 59 with Venv(get_connector_plugin(_conn), debug=debug): 60 params = self.connector.register(self) 61 except Exception: 62 get_console().print_exception() 63 params = None 64 params = {} if params is None else params 65 if not isinstance(params, dict): 66 from meerschaum.utils.warnings import warn 67 warn( 68 f"Invalid parameters returned from `register()` in connector {self.connector}:\n" 69 + f"{params}" 70 ) 71 else: 72 self.parameters = apply_patch_to_config(params, self.parameters) 73 74 if not self.parameters: 75 cols = self.columns if self.columns else {'datetime': None, 'id': None} 76 self.parameters = { 77 'columns': cols, 78 } 79 80 with Venv(get_connector_plugin(self.instance_connector)): 81 return self.instance_connector.register_pipe(self, debug=debug, **kw)
Register a new Pipe along with its attributes.
Parameters
- debug (bool, default False): Verbosity toggle.
- kw (Any):
Keyword arguments to pass to
instance_connector.register_pipe().
Returns
- A
SuccessTupleof success, message.
20@property 21def attributes(self) -> Dict[str, Any]: 22 """ 23 Return a dictionary of a pipe's keys and parameters. 24 These values are reflected directly from the pipes table of the instance. 25 """ 26 from meerschaum.config import get_config 27 from meerschaum.config._patch import apply_patch_to_config 28 from meerschaum.utils.venv import Venv 29 from meerschaum.connectors import get_connector_plugin 30 from meerschaum.utils.dtypes import get_current_timestamp 31 32 timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds') 33 34 now = get_current_timestamp('ms', as_int=True) / 1000 35 _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug) 36 timed_out = ( 37 _attributes_sync_time is None 38 or 39 (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds) 40 ) 41 if not self.temporary and timed_out: 42 self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug) 43 local_attributes = self._get_cached_value('attributes', debug=self.debug) or {} 44 with Venv(get_connector_plugin(self.instance_connector)): 45 instance_attributes = self.instance_connector.get_pipe_attributes(self) 46 47 self._cache_value( 48 'attributes', 49 apply_patch_to_config(instance_attributes, local_attributes), 50 memory_only=True, 51 debug=self.debug, 52 ) 53 54 return self._attributes
Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.
147@property 148def parameters(self) -> Optional[Dict[str, Any]]: 149 """ 150 Return the parameters dictionary of the pipe. 151 """ 152 return self.get_parameters(debug=self.debug)
Return the parameters dictionary of the pipe.
164@property 165def columns(self) -> Union[Dict[str, str], None]: 166 """ 167 Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`. 168 """ 169 cols = self.parameters.get('columns', {}) 170 if not isinstance(cols, dict): 171 return {} 172 return {col_ix: col for col_ix, col in cols.items() if col and col_ix}
Return the columns dictionary defined in meerschaum.Pipe.parameters.
189@property 190def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]: 191 """ 192 Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`. 193 """ 194 _parameters = self.get_parameters(debug=self.debug) 195 indices_key = ( 196 'indexes' 197 if 'indexes' in _parameters 198 else 'indices' 199 ) 200 201 _indices = _parameters.get(indices_key, {}) 202 _columns = self.columns 203 dt_col = _columns.get('datetime', None) 204 if not isinstance(_indices, dict): 205 _indices = {} 206 unique_cols = list(set(( 207 [dt_col] 208 if dt_col 209 else [] 210 ) + [ 211 col 212 for col_ix, col in _columns.items() 213 if col and col_ix != 'datetime' 214 ])) 215 return { 216 **({'unique': unique_cols} if len(unique_cols) > 1 else {}), 217 **{col_ix: col for col_ix, col in _columns.items() if col}, 218 **_indices 219 }
Return the indices dictionary defined in meerschaum.Pipe.parameters.
222@property 223def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]: 224 """ 225 Alias for `meerschaum.Pipe.indices`. 226 """ 227 return self.indices
Alias for meerschaum.Pipe.indices.
278@property 279def dtypes(self) -> Dict[str, Any]: 280 """ 281 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 282 """ 283 return self.get_dtypes(refresh=False, debug=self.debug)
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
386@property 387def autoincrement(self) -> bool: 388 """ 389 Return the `autoincrement` parameter for the pipe. 390 """ 391 return self.parameters.get('autoincrement', False)
Return the autoincrement parameter for the pipe.
402@property 403def autotime(self) -> bool: 404 """ 405 Return the `autotime` parameter for the pipe. 406 """ 407 return self.parameters.get('autotime', False)
Return the autotime parameter for the pipe.
353@property 354def upsert(self) -> bool: 355 """ 356 Return whether `upsert` is set for the pipe. 357 """ 358 return self.parameters.get('upsert', False)
Return whether upsert is set for the pipe.
369@property 370def static(self) -> bool: 371 """ 372 Return whether `static` is set for the pipe. 373 """ 374 return self.parameters.get('static', False)
Return whether static is set for the pipe.
418@property 419def tzinfo(self) -> Union[None, timezone]: 420 """ 421 Return `timezone.utc` if the pipe is timezone-aware. 422 """ 423 _tzinfo = self._get_cached_value('tzinfo', debug=self.debug) 424 if _tzinfo is not None: 425 return _tzinfo if _tzinfo != 'None' else None 426 427 _tzinfo = None 428 dt_col = self.columns.get('datetime', None) 429 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None 430 if self.autotime: 431 ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 432 ts_typ = self.dtypes.get(ts_col, 'datetime') 433 dt_typ = ts_typ 434 435 if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime': 436 _tzinfo = timezone.utc 437 438 self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug) 439 return _tzinfo
Return timezone.utc if the pipe is timezone-aware.
442@property 443def enforce(self) -> bool: 444 """ 445 Return the `enforce` parameter for the pipe. 446 """ 447 return self.parameters.get('enforce', True)
Return the enforce parameter for the pipe.
458@property 459def null_indices(self) -> bool: 460 """ 461 Return the `null_indices` parameter for the pipe. 462 """ 463 return self.parameters.get('null_indices', True)
Return the null_indices parameter for the pipe.
474@property 475def mixed_numerics(self) -> bool: 476 """ 477 Return the `mixed_numerics` parameter for the pipe. 478 """ 479 return self.parameters.get('mixed_numerics', True)
Return the mixed_numerics parameter for the pipe.
490def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]: 491 """ 492 Check if the requested columns are defined. 493 494 Parameters 495 ---------- 496 *args: str 497 The column names to be retrieved. 498 499 error: bool, default False 500 If `True`, raise an `Exception` if the specified column is not defined. 501 502 Returns 503 ------- 504 A tuple of the same size of `args` or a `str` if `args` is a single argument. 505 506 Examples 507 -------- 508 >>> pipe = mrsm.Pipe('test', 'test') 509 >>> pipe.columns = {'datetime': 'dt', 'id': 'id'} 510 >>> pipe.get_columns('datetime', 'id') 511 ('dt', 'id') 512 >>> pipe.get_columns('value', error=True) 513 Exception: 🛑 Missing 'value' column for Pipe('test', 'test'). 514 """ 515 from meerschaum.utils.warnings import error as _error 516 if not args: 517 args = tuple(self.columns.keys()) 518 col_names = [] 519 for col in args: 520 col_name = None 521 try: 522 col_name = self.columns[col] 523 if col_name is None and error: 524 _error(f"Please define the name of the '{col}' column for {self}.") 525 except Exception: 526 col_name = None 527 if col_name is None and error: 528 _error(f"Missing '{col}'" + f" column for {self}.") 529 col_names.append(col_name) 530 if len(col_names) == 1: 531 return col_names[0] 532 return tuple(col_names)
Check if the requested columns are defined.
Parameters
- *args (str): The column names to be retrieved.
- error (bool, default False):
If
True, raise anExceptionif the specified column is not defined.
Returns
- A tuple of the same size of
argsor astrifargsis a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception: 🛑 Missing 'value' column for Pipe('test', 'test').
535def get_columns_types( 536 self, 537 refresh: bool = False, 538 debug: bool = False, 539) -> Union[Dict[str, str], None]: 540 """ 541 Get a dictionary of a pipe's column names and their types. 542 543 Parameters 544 ---------- 545 refresh: bool, default False 546 If `True`, invalidate the cache and fetch directly from the instance connector. 547 548 debug: bool, default False: 549 Verbosity toggle. 550 551 Returns 552 ------- 553 A dictionary of column names (`str`) to column types (`str`). 554 555 Examples 556 -------- 557 >>> pipe.get_columns_types() 558 { 559 'dt': 'TIMESTAMP WITH TIMEZONE', 560 'id': 'BIGINT', 561 'val': 'DOUBLE PRECISION', 562 } 563 >>> 564 """ 565 from meerschaum.connectors import get_connector_plugin 566 from meerschaum.utils.dtypes import get_current_timestamp 567 568 now = get_current_timestamp('ms', as_int=True) / 1000 569 cache_seconds = ( 570 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 571 if self.static 572 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 573 ) 574 if refresh: 575 self._clear_cache_key('_columns_types_timestamp', debug=debug) 576 self._clear_cache_key('_columns_types', debug=debug) 577 578 _columns_types = self._get_cached_value('_columns_types', debug=debug) 579 if _columns_types: 580 columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug) 581 if columns_types_timestamp is not None: 582 delta = now - columns_types_timestamp 583 if delta < cache_seconds: 584 if debug: 585 dprint( 586 f"Returning cached `columns_types` for {self} " 587 f"({round(delta, 2)} seconds old)." 588 ) 589 return _columns_types 590 591 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 592 _columns_types = ( 593 self.instance_connector.get_pipe_columns_types(self, debug=debug) 594 if hasattr(self.instance_connector, 'get_pipe_columns_types') 595 else None 596 ) 597 598 self._cache_value('_columns_types', _columns_types, debug=debug) 599 self._cache_value('_columns_types_timestamp', now, debug=debug) 600 return _columns_types or {}
Get a dictionary of a pipe's column names and their types.
Parameters
- refresh (bool, default False):
If
True, invalidate the cache and fetch directly from the instance connector. - debug (bool, default False:): Verbosity toggle.
Returns
- A dictionary of column names (
str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
'dt': 'TIMESTAMP WITH TIMEZONE',
'id': 'BIGINT',
'val': 'DOUBLE PRECISION',
}
>>>
603def get_columns_indices( 604 self, 605 debug: bool = False, 606 refresh: bool = False, 607) -> Dict[str, List[Dict[str, str]]]: 608 """ 609 Return a dictionary mapping columns to index information. 610 """ 611 from meerschaum.connectors import get_connector_plugin 612 from meerschaum.utils.dtypes import get_current_timestamp 613 614 now = get_current_timestamp('ms', as_int=True) / 1000 615 cache_seconds = ( 616 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 617 if self.static 618 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 619 ) 620 if refresh: 621 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 622 self._clear_cache_key('_columns_indices', debug=debug) 623 624 _columns_indices = self._get_cached_value('_columns_indices', debug=debug) 625 626 if _columns_indices: 627 columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug) 628 if columns_indices_timestamp is not None: 629 delta = now - columns_indices_timestamp 630 if delta < cache_seconds: 631 if debug: 632 dprint( 633 f"Returning cached `columns_indices` for {self} " 634 f"({round(delta, 2)} seconds old)." 635 ) 636 return _columns_indices 637 638 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 639 _columns_indices = ( 640 self.instance_connector.get_pipe_columns_indices(self, debug=debug) 641 if hasattr(self.instance_connector, 'get_pipe_columns_indices') 642 else None 643 ) 644 645 self._cache_value('_columns_indices', _columns_indices, debug=debug) 646 self._cache_value('_columns_indices_timestamp', now, debug=debug) 647 return {k: v for k, v in _columns_indices.items() if k and v} or {}
Return a dictionary mapping columns to index information.
1047def get_indices(self) -> Dict[str, str]: 1048 """ 1049 Return a dictionary mapping index keys to their names in the database. 1050 1051 Returns 1052 ------- 1053 A dictionary of index keys to index names. 1054 """ 1055 from meerschaum.connectors import get_connector_plugin 1056 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 1057 if hasattr(self.instance_connector, 'get_pipe_index_names'): 1058 result = self.instance_connector.get_pipe_index_names(self) 1059 else: 1060 result = {} 1061 1062 return result
Return a dictionary mapping index keys to their names in the database.
Returns
- A dictionary of index keys to index names.
59def get_parameters( 60 self, 61 apply_symlinks: bool = True, 62 refresh: bool = False, 63 debug: bool = False, 64 _visited: 'Optional[set[mrsm.Pipe]]' = None, 65) -> Dict[str, Any]: 66 """ 67 Return the `parameters` dictionary of the pipe. 68 69 Parameters 70 ---------- 71 apply_symlinks: bool, default True 72 If `True`, resolve references to parameters from other pipes. 73 74 refresh: bool, default False 75 If `True`, pull the latest attributes for the pipe. 76 77 Returns 78 ------- 79 The pipe's parameters dictionary. 80 """ 81 from meerschaum.config._patch import apply_patch_to_config 82 from meerschaum.config._read_config import search_and_substitute_config 83 84 if _visited is None: 85 _visited = {self} 86 87 if refresh: 88 _ = self._invalidate_cache(hard=True) 89 90 raw_parameters = self.attributes.get('parameters', {}) 91 if not apply_symlinks: 92 return raw_parameters 93 94 parameters = {} 95 for ref_pipe in self.references: 96 try: 97 if ref_pipe in _visited: 98 warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.") 99 return search_and_substitute_config(raw_parameters) 100 101 _visited.add(ref_pipe) 102 if refresh: 103 _ = _cached_base_params.pop(ref_pipe, None) 104 base_params = _cached_base_params.get(ref_pipe, None) 105 if base_params is None: 106 base_params = ref_pipe.get_parameters( 107 apply_symlinks=apply_symlinks, 108 _visited=_visited, 109 debug=debug, 110 ) 111 _cached_base_params[ref_pipe] = base_params 112 if debug: 113 dprint(f"base_params from {ref_pipe} for {self}:") 114 mrsm.pprint(base_params) 115 else: 116 if debug: 117 dprint(f"Using cached base_params from {ref_pipe} for {self}") 118 except Exception as e: 119 warn(f"Failed to resolve reference pipe for {self}: {e}") 120 base_params = {} 121 122 parameters = apply_patch_to_config(parameters, base_params) 123 124 parameters = apply_patch_to_config(parameters, raw_parameters) 125 126 from meerschaum.utils.pipes import replace_pipes_syntax 127 self._symlinks = {} 128 129 def recursive_replace(obj: Any, path: tuple) -> Any: 130 if isinstance(obj, dict): 131 return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()} 132 if isinstance(obj, list): 133 return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)] 134 if isinstance(obj, str): 135 substituted_val = replace_pipes_syntax(obj, _pipe=self) 136 if substituted_val != obj: 137 self._symlinks[path] = { 138 'original': obj, 139 'substituted': substituted_val, 140 } 141 return substituted_val 142 return obj 143 144 return search_and_substitute_config(recursive_replace(parameters, tuple()))
Return the parameters dictionary of the pipe.
Parameters
- apply_symlinks (bool, default True):
If
True, resolve references to parameters from other pipes. - refresh (bool, default False):
If
True, pull the latest attributes for the pipe.
Returns
- The pipe's parameters dictionary.
297def get_dtypes( 298 self, 299 infer: bool = True, 300 refresh: bool = False, 301 debug: bool = False, 302) -> Dict[str, Any]: 303 """ 304 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 305 306 Parameters 307 ---------- 308 infer: bool, default True 309 If `True`, include the implicit existing dtypes. 310 Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`). 311 312 refresh: bool, default False 313 If `True`, invalidate any cache and return the latest known dtypes. 314 315 Returns 316 ------- 317 A dictionary mapping column names to dtypes. 318 """ 319 from meerschaum.config._patch import apply_patch_to_config 320 from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES 321 parameters = self.get_parameters(refresh=refresh, debug=debug) 322 configured_dtypes = parameters.get('dtypes', {}) 323 if debug: 324 dprint(f"Configured dtypes for {self}:") 325 mrsm.pprint(configured_dtypes) 326 327 remote_dtypes = ( 328 self.infer_dtypes(persist=False, refresh=refresh, debug=debug) 329 if infer 330 else {} 331 ) 332 if debug and infer: 333 dprint(f"Remote dtypes for {self}:") 334 mrsm.pprint(remote_dtypes) 335 336 patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {})) 337 338 dt_col = parameters.get('columns', {}).get('datetime', None) 339 primary_col = parameters.get('columns', {}).get('primary', None) 340 _dtypes = { 341 col: MRSM_ALIAS_DTYPES.get(typ, typ) 342 for col, typ in patched_dtypes.items() 343 if col and typ 344 } 345 if dt_col and dt_col not in configured_dtypes: 346 _dtypes[dt_col] = 'datetime' 347 if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes: 348 _dtypes[primary_col] = 'int' 349 350 return _dtypes
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
Parameters
- infer (bool, default True):
If
True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g.Pipe.parameters['dtypes']). - refresh (bool, default False):
If
True, invalidate any cache and return the latest known dtypes.
Returns
- A dictionary mapping column names to dtypes.
1065def update_parameters( 1066 self, 1067 parameters_patch: Dict[str, Any], 1068 persist: bool = True, 1069 debug: bool = False, 1070) -> mrsm.SuccessTuple: 1071 """ 1072 Apply a patch to a pipe's `parameters` dictionary. 1073 1074 Parameters 1075 ---------- 1076 parameters_patch: Dict[str, Any] 1077 The patch to be applied to `Pipe.parameters`. 1078 1079 persist: bool, default True 1080 If `True`, call `Pipe.edit()` to persist the new parameters. 1081 """ 1082 from meerschaum.config import apply_patch_to_config 1083 if 'parameters' not in self._attributes: 1084 self._attributes['parameters'] = {} 1085 1086 self._attributes['parameters'] = apply_patch_to_config( 1087 self._attributes['parameters'], 1088 parameters_patch, 1089 ) 1090 1091 if self.temporary: 1092 persist = False 1093 1094 if not persist: 1095 return True, "Success" 1096 1097 return self.edit(debug=debug)
Apply a patch to a pipe's parameters dictionary.
Parameters
- parameters_patch (Dict[str, Any]):
The patch to be applied to
Pipe.parameters. - persist (bool, default True):
If
True, callPipe.edit()to persist the new parameters.
650def get_id(self, **kw: Any) -> Union[int, str, None]: 651 """ 652 Fetch a pipe's ID from its instance connector. 653 If the pipe is not registered, return `None`. 654 """ 655 if self.temporary: 656 return None 657 658 from meerschaum.utils.venv import Venv 659 from meerschaum.connectors import get_connector_plugin 660 661 with Venv(get_connector_plugin(self.instance_connector)): 662 if hasattr(self.instance_connector, 'get_pipe_id'): 663 return self.instance_connector.get_pipe_id(self, **kw) 664 665 return None
Fetch a pipe's ID from its instance connector.
If the pipe is not registered, return None.
668@property 669def id(self) -> Union[int, str, uuid.UUID, None]: 670 """ 671 Fetch and cache a pipe's ID. 672 """ 673 _id = self._get_cached_value('_id', debug=self.debug) 674 if _id is None: 675 _id = self.get_id(debug=self.debug) 676 if _id is not None: 677 self._cache_value('_id', _id, debug=self.debug) 678 return _id
Fetch and cache a pipe's ID.
681def get_val_column(self, debug: bool = False) -> Union[str, None]: 682 """ 683 Return the name of the value column if it's defined, otherwise make an educated guess. 684 If not set in the `columns` dictionary, return the first numeric column that is not 685 an ID or datetime column. 686 If none may be found, return `None`. 687 688 Parameters 689 ---------- 690 debug: bool, default False: 691 Verbosity toggle. 692 693 Returns 694 ------- 695 Either a string or `None`. 696 """ 697 if debug: 698 dprint('Attempting to determine the value column...') 699 try: 700 val_name = self.get_columns('value') 701 except Exception: 702 val_name = None 703 if val_name is not None: 704 if debug: 705 dprint(f"Value column: {val_name}") 706 return val_name 707 708 cols = self.columns 709 if cols is None: 710 if debug: 711 dprint('No columns could be determined. Returning...') 712 return None 713 try: 714 dt_name = self.get_columns('datetime', error=False) 715 except Exception: 716 dt_name = None 717 try: 718 id_name = self.get_columns('id', errors=False) 719 except Exception: 720 id_name = None 721 722 if debug: 723 dprint(f"dt_name: {dt_name}") 724 dprint(f"id_name: {id_name}") 725 726 cols_types = self.get_columns_types(debug=debug) 727 if cols_types is None: 728 return None 729 if debug: 730 dprint(f"cols_types: {cols_types}") 731 if dt_name is not None: 732 cols_types.pop(dt_name, None) 733 if id_name is not None: 734 cols_types.pop(id_name, None) 735 736 candidates = [] 737 candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',} 738 for search_term in candidate_keywords: 739 for col, typ in cols_types.items(): 740 if search_term in typ.lower(): 741 candidates.append(col) 742 break 743 if not candidates: 744 if debug: 745 dprint("No value column could be determined.") 746 return None 747 748 return candidates[0]
Return the name of the value column if it's defined, otherwise make an educated guess.
If not set in the columns dictionary, return the first numeric column that is not
an ID or datetime column.
If none may be found, return None.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- Either a string or
None.
751@property 752def parents(self) -> List[mrsm.Pipe]: 753 """ 754 Return a list of `meerschaum.Pipe` objects to be designated as parents. 755 """ 756 _cached_parents = self.__dict__.get('_parents', None) 757 if _cached_parents is not None: 758 return _cached_parents 759 760 from meerschaum.utils.pipes import get_pipe_from_value 761 base_params = self.get_parameters() 762 key = 'parents' if 'parents' in base_params else 'parent' 763 parents_refs = base_params.get(key, None) or [] 764 if isinstance(parents_refs, str) or isinstance(parents_refs, dict): 765 parents_refs = [parents_refs] 766 767 if not parents_refs: 768 return [] 769 770 self._parents = [get_pipe_from_value(val, _pipe=self) for val in parents_refs] 771 return self._parents
Return a list of meerschaum.Pipe objects to be designated as parents.
774@property 775def parent(self) -> Union[mrsm.Pipe, None]: 776 """ 777 Return the first pipe in `self.parents` or `None`. 778 """ 779 _parents = self.parents 780 if not _parents: 781 return None 782 783 return _parents[0]
Return the first pipe in self.parents or None.
819@property 820def children(self) -> List[mrsm.Pipe]: 821 """ 822 Return a list of `meerschaum.Pipe` objects to be designated as children. 823 """ 824 _cached_children = self.__dict__.get('_children', None) 825 if _cached_children is not None: 826 return _cached_children 827 828 from meerschaum.utils.pipes import get_pipe_from_value 829 base_params = self.get_parameters() 830 key = 'children' if 'children' in base_params else 'child' 831 children_refs = base_params.get(key, None) or [] 832 if isinstance(children_refs, str) or isinstance(children_refs, dict): 833 children_refs = [children_refs] 834 835 if not children_refs: 836 return [] 837 838 self._children = [get_pipe_from_value(val, _pipe=self) for val in children_refs] 839 return self._children
Return a list of meerschaum.Pipe objects to be designated as children.
842@property 843def child(self) -> mrsm.Pipe | None: 844 """ 845 Return the first pipe in `self.children` or None. 846 """ 847 _children = self.children 848 if not _children: 849 return None 850 851 return _children[0]
Return the first pipe in self.children or None.
911@property 912def reference(self) -> mrsm.Pipe | None: 913 """ 914 Return the first pipe in `self.references` or None. 915 """ 916 _references = self.references 917 if not _references: 918 return None 919 920 return _references[0]
Return the first pipe in self.references or None.
888@property 889def references(self) -> List[mrsm.Pipe]: 890 """ 891 Return a list of `meerschaum.Pipe` objects to be designated as references. 892 """ 893 _cached_references = self.__dict__.get('_references', None) 894 if _cached_references is not None: 895 return _cached_references 896 897 from meerschaum.utils.pipes import get_pipe_from_value 898 base_params = self.get_parameters(apply_symlinks=False) 899 key = 'references' if 'references' in base_params else 'reference' 900 refs = base_params.get(key, None) or [] 901 if isinstance(refs, str) or isinstance(refs, dict): 902 refs = [refs] 903 904 if not refs: 905 return [] 906 907 self._refs = [get_pipe_from_value(val, _pipe=self) for val in refs] 908 return self._refs
Return a list of meerschaum.Pipe objects to be designated as references.
958@property 959def target(self) -> str: 960 """ 961 The target table name. 962 You can set the target name under on of the following keys 963 (checked in this order): 964 - `target` 965 - `target_name` 966 - `target_table` 967 - `target_table_name` 968 """ 969 target_val = self.parameters.get('target', None) 970 if not target_val: 971 default_target = self._target_legacy() 972 default_targets = {default_target} 973 potential_keys = ('target_name', 'target_table', 'target_table_name') 974 _target = None 975 for k in potential_keys: 976 if k in self.parameters: 977 _target = self.parameters[k] 978 break 979 980 _target = _target or default_target 981 982 if self.instance_connector.type == 'sql': 983 from meerschaum.utils.sql import truncate_item_name 984 truncated_target = truncate_item_name(_target, self.instance_connector.flavor) 985 default_targets.add(truncated_target) 986 warned_target = self.__dict__.get('_warned_target', False) 987 if truncated_target != _target and not warned_target: 988 if self.instance_connector.flavor not in ('oracle', 'mysql', 'mariadb'): 989 warn( 990 f"The target '{_target}' is too long for '{self.instance_connector.flavor}', " 991 + f"will use {truncated_target} instead." 992 ) 993 self.__dict__['_warned_target'] = True 994 _target = truncated_target 995 996 if _target in default_targets: 997 return _target 998 999 self.target = _target 1000 return _target 1001 1002 return target_val
The target table name. You can set the target name under on of the following keys (checked in this order):
targettarget_nametarget_tabletarget_table_name
1025def guess_datetime(self) -> Union[str, None]: 1026 """ 1027 Try to determine a pipe's datetime column. 1028 """ 1029 _dtypes = self.dtypes 1030 1031 ### Abort if the user explictly disallows a datetime index. 1032 if 'datetime' in _dtypes: 1033 if _dtypes['datetime'] is None: 1034 return None 1035 1036 from meerschaum.utils.dtypes import are_dtypes_equal 1037 dt_cols = [ 1038 col 1039 for col, typ in _dtypes.items() 1040 if are_dtypes_equal(typ, 'datetime') 1041 ] 1042 if not dt_cols: 1043 return None 1044 return dt_cols[0]
Try to determine a pipe's datetime column.
1189@property 1190def precision(self) -> Dict[str, Union[str, int]]: 1191 """ 1192 Return the configured or detected precision. 1193 """ 1194 return self.get_precision(debug=self.debug)
Return the configured or detected precision.
1100def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]: 1101 """ 1102 Return the timestamp precision unit and interval for the `datetime` axis. 1103 """ 1104 from meerschaum.utils.dtypes import ( 1105 MRSM_PRECISION_UNITS_SCALARS, 1106 MRSM_PRECISION_UNITS_ALIASES, 1107 MRSM_PD_DTYPES, 1108 are_dtypes_equal, 1109 ) 1110 from meerschaum._internal.static import STATIC_CONFIG 1111 1112 _precision = self._get_cached_value('precision', debug=debug) 1113 if _precision: 1114 if debug: 1115 dprint(f"Returning cached precision: {_precision}") 1116 return _precision 1117 1118 parameters = self.parameters 1119 _precision = parameters.get('precision', {}) 1120 if isinstance(_precision, str): 1121 _precision = {'unit': _precision} 1122 default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 1123 1124 if not _precision: 1125 1126 dt_col = parameters.get('columns', {}).get('datetime', None) 1127 if not dt_col and self.autotime: 1128 dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 1129 if not dt_col: 1130 if debug: 1131 dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.") 1132 return {'unit': default_precision_unit} 1133 1134 dt_typ = self.dtypes.get(dt_col, 'datetime') 1135 if are_dtypes_equal(dt_typ, 'datetime'): 1136 if dt_typ == 'datetime': 1137 dt_typ = MRSM_PD_DTYPES['datetime'] 1138 if debug: 1139 dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.") 1140 1141 _precision = { 1142 'unit': ( 1143 dt_typ 1144 .split('[', maxsplit=1)[-1] 1145 .split(',', maxsplit=1)[0] 1146 .split(' ', maxsplit=1)[0] 1147 ).rstrip(']') 1148 } 1149 1150 if debug: 1151 dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.") 1152 1153 elif are_dtypes_equal(dt_typ, 'int'): 1154 _precision = { 1155 'unit': ( 1156 'second' 1157 if '32' in dt_typ 1158 else default_precision_unit 1159 ) 1160 } 1161 elif are_dtypes_equal(dt_typ, 'date'): 1162 if debug: 1163 dprint("Datetime axis is 'date', falling back to 'day' precision.") 1164 _precision = {'unit': 'day'} 1165 1166 precision_unit = _precision.get('unit', default_precision_unit) 1167 precision_interval = _precision.get('interval', None) 1168 true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 1169 if true_precision_unit is None: 1170 if debug: 1171 dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.") 1172 true_precision_unit = default_precision_unit 1173 1174 if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS: 1175 from meerschaum.utils.misc import items_str 1176 raise ValueError( 1177 f"Invalid precision unit '{true_precision_unit}'.\n" 1178 "Accepted values are " 1179 f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}." 1180 ) 1181 1182 _precision = {'unit': true_precision_unit} 1183 if precision_interval: 1184 _precision['interval'] = precision_interval 1185 self._cache_value('precision', _precision, debug=debug) 1186 return self._precision
Return the timestamp precision unit and interval for the datetime axis.
12def show( 13 self, 14 nopretty: bool = False, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Show attributes of a Pipe. 20 21 Parameters 22 ---------- 23 nopretty: bool, default False 24 If `True`, simply print the JSON of the pipe's attributes. 25 26 debug: bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success, message. 32 33 """ 34 import json 35 from meerschaum.utils.formatting import ( 36 pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console, 37 ) 38 from meerschaum.utils.packages import import_rich, attempt_import 39 from meerschaum.utils.warnings import info 40 attributes_json = json.dumps(self.attributes) 41 if not nopretty: 42 _to_print = f"Attributes for {self}:" 43 if ANSI: 44 _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta') 45 print(_to_print) 46 rich = import_rich() 47 rich_json = attempt_import('rich.json') 48 get_console().print(rich_json.JSON(attributes_json)) 49 else: 50 print(_to_print) 51 else: 52 print(attributes_json) 53 54 return True, "Success"
Show attributes of a Pipe.
Parameters
- nopretty (bool, default False):
If
True, simply print the JSON of the pipe's attributes. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
21def edit( 22 self, 23 patch: bool = False, 24 interactive: bool = False, 25 debug: bool = False, 26 **kw: Any 27) -> SuccessTuple: 28 """ 29 Edit a Pipe's configuration. 30 31 Parameters 32 ---------- 33 patch: bool, default False 34 If `patch` is True, update parameters by cascading rather than overwriting. 35 interactive: bool, default False 36 If `True`, open an editor for the user to make changes to the pipe's YAML file. 37 debug: bool, default False 38 Verbosity toggle. 39 40 Returns 41 ------- 42 A `SuccessTuple` of success, message. 43 44 """ 45 from meerschaum.utils.venv import Venv 46 from meerschaum.connectors import get_connector_plugin 47 48 if self.temporary: 49 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 50 51 self._invalidate_cache(hard=True, debug=debug) 52 53 if hasattr(self, '_symlinks'): 54 from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path 55 for path, vals in self._symlinks.items(): 56 current_val = get_val_from_dict_path(self.parameters, path) 57 if current_val == vals['substituted']: 58 set_val_in_dict_path(self.parameters, path, vals['original']) 59 60 if not interactive: 61 with Venv(get_connector_plugin(self.instance_connector)): 62 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw) 63 64 import meerschaum.config.paths as paths 65 from meerschaum.utils.misc import edit_file 66 parameters_filename = str(self) + '.yaml' 67 parameters_path = paths.PIPES_CACHE_RESOURCES_PATH / parameters_filename 68 69 from meerschaum.utils.yaml import yaml 70 71 edit_text = f"Edit the parameters for {self}" 72 edit_top = '#' * (len(edit_text) + 4) 73 edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n' 74 75 from meerschaum.config import get_config 76 parameters = dict(get_config('pipes', 'parameters', patch=True)) 77 from meerschaum.config._patch import apply_patch_to_config 78 raw_parameters = self.attributes.get('parameters', {}) 79 parameters = apply_patch_to_config(parameters, raw_parameters) 80 81 ### write parameters to yaml file 82 with open(parameters_path, 'w+') as f: 83 f.write(edit_header) 84 yaml.dump(parameters, stream=f, sort_keys=False) 85 86 ### only quit editing if yaml is valid 87 editing = True 88 while editing: 89 edit_file(parameters_path) 90 try: 91 with open(parameters_path, 'r') as f: 92 file_parameters = yaml.load(f.read()) 93 except Exception as e: 94 from meerschaum.utils.warnings import warn 95 warn(f"Invalid format defined for '{self}':\n\n{e}") 96 input(f"Press [Enter] to correct the configuration for '{self}': ") 97 else: 98 editing = False 99 100 self.parameters = file_parameters 101 102 if debug: 103 from meerschaum.utils.formatting import pprint 104 pprint(self.parameters) 105 106 with Venv(get_connector_plugin(self.instance_connector)): 107 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
Edit a Pipe's configuration.
Parameters
- patch (bool, default False):
If
patchis True, update parameters by cascading rather than overwriting. - interactive (bool, default False):
If
True, open an editor for the user to make changes to the pipe's YAML file. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
110def edit_definition( 111 self, 112 yes: bool = False, 113 noask: bool = False, 114 force: bool = False, 115 debug : bool = False, 116 **kw : Any 117) -> SuccessTuple: 118 """ 119 Edit a pipe's definition file and update its configuration. 120 **NOTE:** This function is interactive and should not be used in automated scripts! 121 122 Returns 123 ------- 124 A `SuccessTuple` of success, message. 125 126 """ 127 if self.temporary: 128 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 129 130 from meerschaum.connectors import instance_types 131 if (self.connector is None or isinstance(self.connector, str)) or self.connector.type not in instance_types: 132 return self.edit(interactive=True, debug=debug, **kw) 133 134 import json 135 from meerschaum.utils.warnings import info, warn 136 from meerschaum.utils.debug import dprint 137 from meerschaum.config._patch import apply_patch_to_config 138 from meerschaum.utils.misc import edit_file 139 140 _parameters = self.parameters 141 if 'fetch' not in _parameters: 142 _parameters['fetch'] = {} 143 144 def _edit_api(): 145 from meerschaum.utils.prompt import prompt, yes_no 146 info( 147 f"Please enter the keys of the source pipe from '{self.connector}'.\n" + 148 "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip." 149 ) 150 151 _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None } 152 for k in _keys: 153 _keys[k] = _parameters['fetch'].get(k, None) 154 155 for k, v in _keys.items(): 156 try: 157 _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v) 158 except KeyboardInterrupt: 159 continue 160 if _keys[k] in ('', 'None', '\'None\'', '[None]'): 161 _keys[k] = None 162 163 _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys) 164 165 info("You may optionally specify additional filter parameters as JSON.") 166 print(" Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.") 167 print(" For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':") 168 print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': '))) 169 if force or yes_no( 170 "Would you like to add additional filter parameters?", 171 yes=yes, noask=noask 172 ): 173 import meerschaum.config.paths as paths 174 definition_filename = str(self) + '.json' 175 definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename 176 try: 177 definition_path.touch() 178 with open(definition_path, 'w+') as f: 179 json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2) 180 except Exception as e: 181 return False, f"Failed writing file '{definition_path}':\n" + str(e) 182 183 _params = None 184 while True: 185 edit_file(definition_path) 186 try: 187 with open(definition_path, 'r') as f: 188 _params = json.load(f) 189 except Exception as e: 190 warn(f'Failed to read parameters JSON:\n{e}', stack=False) 191 if force or yes_no( 192 "Would you like to try again?\n " 193 + "If not, the parameters JSON file will be ignored.", 194 noask=noask, yes=yes 195 ): 196 continue 197 _params = None 198 break 199 if _params is not None: 200 if 'fetch' not in _parameters: 201 _parameters['fetch'] = {} 202 _parameters['fetch']['params'] = _params 203 204 self.parameters = _parameters 205 return True, "Success" 206 207 def _edit_sql(): 208 import textwrap 209 import meerschaum.config.paths as paths 210 from meerschaum.utils.misc import edit_file 211 definition_filename = str(self) + '.sql' 212 definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename 213 214 sql_definition = _parameters['fetch'].get('definition', None) 215 if sql_definition is None: 216 sql_definition = '' 217 sql_definition = textwrap.dedent(sql_definition).lstrip() 218 219 try: 220 definition_path.touch() 221 with open(definition_path, 'w+') as f: 222 f.write(sql_definition) 223 except Exception as e: 224 return False, f"Failed writing file '{definition_path}':\n" + str(e) 225 226 edit_file(definition_path) 227 try: 228 with open(definition_path, 'r', encoding='utf-8') as f: 229 file_definition = f.read() 230 except Exception as e: 231 return False, f"Failed reading file '{definition_path}':\n" + str(e) 232 233 if sql_definition == file_definition: 234 return False, f"No changes made to definition for {self}." 235 236 if ' ' not in file_definition: 237 return False, f"Invalid SQL definition for {self}." 238 239 if debug: 240 dprint("Read SQL definition:\n\n" + file_definition) 241 _parameters['fetch']['definition'] = file_definition 242 self.parameters = _parameters 243 return True, "Success" 244 245 locals()['_edit_' + str(self.connector.type)]() 246 return self.edit(interactive=False, debug=debug, **kw)
Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!
Returns
- A
SuccessTupleof success, message.
13def update(self, *args, **kw) -> SuccessTuple: 14 """ 15 Update a pipe's parameters in its instance. 16 """ 17 kw['interactive'] = False 18 return self.edit(*args, **kw)
Update a pipe's parameters in its instance.
41def sync( 42 self, 43 df: Union[ 44 pd.DataFrame, 45 Dict[str, List[Any]], 46 List[Dict[str, Any]], 47 str, 48 InferFetch 49 ] = InferFetch, 50 begin: Union[datetime, int, str, None] = '', 51 end: Union[datetime, int, None] = None, 52 force: bool = False, 53 retries: int = 10, 54 min_seconds: int = 1, 55 check_existing: bool = True, 56 enforce_dtypes: bool = True, 57 blocking: bool = True, 58 workers: Optional[int] = None, 59 callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, 60 error_callback: Optional[Callable[[Exception], Any]] = None, 61 chunksize: Optional[int] = -1, 62 sync_chunks: bool = True, 63 debug: bool = False, 64 _inplace: bool = True, 65 **kw: Any 66) -> SuccessTuple: 67 """ 68 Fetch new data from the source and update the pipe's table with new data. 69 70 Get new remote data via fetch, get existing data in the same time period, 71 and merge the two, only keeping the unseen data. 72 73 Parameters 74 ---------- 75 df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None 76 An optional DataFrame to sync into the pipe. Defaults to `None`. 77 If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`. 78 79 begin: Union[datetime, int, str, None], default '' 80 Optionally specify the earliest datetime to search for data. 81 82 end: Union[datetime, int, str, None], default None 83 Optionally specify the latest datetime to search for data. 84 85 force: bool, default False 86 If `True`, keep trying to sync untul `retries` attempts. 87 88 retries: int, default 10 89 If `force`, how many attempts to try syncing before declaring failure. 90 91 min_seconds: Union[int, float], default 1 92 If `force`, how many seconds to sleep between retries. Defaults to `1`. 93 94 check_existing: bool, default True 95 If `True`, pull and diff with existing data from the pipe. 96 97 enforce_dtypes: bool, default True 98 If `True`, enforce dtypes on incoming data. 99 Set this to `False` if the incoming rows are expected to be of the correct dtypes. 100 101 blocking: bool, default True 102 If `True`, wait for sync to finish and return its result, otherwise 103 asyncronously sync (oxymoron?) and return success. Defaults to `True`. 104 Only intended for specific scenarios. 105 106 workers: Optional[int], default None 107 If provided and the instance connector is thread-safe 108 (`pipe.instance_connector.IS_THREAD_SAFE is True`), 109 limit concurrent sync to this many threads. 110 111 callback: Optional[Callable[[Tuple[bool, str]], Any]], default None 112 Callback function which expects a SuccessTuple as input. 113 Only applies when `blocking=False`. 114 115 error_callback: Optional[Callable[[Exception], Any]], default None 116 Callback function which expects an Exception as input. 117 Only applies when `blocking=False`. 118 119 chunksize: int, default -1 120 Specify the number of rows to sync per chunk. 121 If `-1`, resort to system configuration (default is `900`). 122 A `chunksize` of `None` will sync all rows in one transaction. 123 124 sync_chunks: bool, default True 125 If possible, sync chunks while fetching them into memory. 126 127 debug: bool, default False 128 Verbosity toggle. Defaults to False. 129 130 Returns 131 ------- 132 A `SuccessTuple` of success (`bool`) and message (`str`). 133 """ 134 from meerschaum.utils.debug import dprint, _checkpoint 135 from meerschaum.utils.formatting import get_console 136 from meerschaum.utils.venv import Venv 137 from meerschaum.connectors import get_connector_plugin 138 from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments 139 from meerschaum.utils.pool import get_pool 140 from meerschaum.config import get_config 141 from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp 142 143 if (callback is not None or error_callback is not None) and blocking: 144 warn("Callback functions are only executed when blocking = False. Ignoring...") 145 146 _checkpoint(_total=2, **kw) 147 148 if chunksize == 0: 149 chunksize = None 150 sync_chunks = False 151 152 begin, end = self.parse_date_bounds(begin, end) 153 kw.update({ 154 'begin': begin, 155 'end': end, 156 'force': force, 157 'retries': retries, 158 'min_seconds': min_seconds, 159 'check_existing': check_existing, 160 'blocking': blocking, 161 'workers': workers, 162 'callback': callback, 163 'error_callback': error_callback, 164 'sync_chunks': sync_chunks, 165 'chunksize': chunksize, 166 'safe_copy': True, 167 }) 168 169 self._invalidate_cache(debug=debug) 170 self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug) 171 172 def _sync( 173 p: mrsm.Pipe, 174 df: Union[ 175 'pd.DataFrame', 176 Dict[str, List[Any]], 177 List[Dict[str, Any]], 178 str, 179 InferFetch 180 ] = InferFetch, 181 ) -> SuccessTuple: 182 if df is None: 183 p._invalidate_cache(debug=debug) 184 return ( 185 False, 186 f"You passed `None` instead of data into `sync()` for {p}.\n" 187 + "Omit the DataFrame to infer fetching.", 188 ) 189 ### Ensure that Pipe is registered. 190 if not p.temporary and p.id is None: 191 ### NOTE: This may trigger an interactive session for plugins! 192 register_success, register_msg = p.register(debug=debug) 193 if not register_success: 194 if 'already' not in register_msg: 195 p._invalidate_cache(debug=debug) 196 return register_success, register_msg 197 198 if isinstance(df, str): 199 from meerschaum.utils.dataframe import parse_simple_lines 200 df = parse_simple_lines(df) 201 202 ### If connector is a plugin with a `sync()` method, return that instead. 203 ### If the plugin does not have a `sync()` method but does have a `fetch()` method, 204 ### use that instead. 205 ### NOTE: The DataFrame must be omitted for the plugin sync method to apply. 206 ### If a DataFrame is provided, continue as expected. 207 if hasattr(df, 'MRSM_INFER_FETCH'): 208 try: 209 if isinstance(p.connector, str): 210 if ':' not in p.connector_keys: 211 return True, f"{p} does not support fetching; nothing to do." 212 213 msg = f"{p} does not have a valid connector." 214 if p.connector_keys.startswith('plugin:'): 215 msg += f"\n Perhaps {p.connector_keys} has a syntax error?" 216 p._invalidate_cache(debug=debug) 217 return False, msg 218 except Exception: 219 p._invalidate_cache(debug=debug) 220 return False, f"Unable to create the connector for {p}." 221 222 ### Sync in place if possible. 223 if ( 224 str(self.connector) == str(self.instance_connector) 225 and 226 hasattr(self.instance_connector, 'sync_pipe_inplace') 227 and 228 _inplace 229 and 230 get_config('system', 'experimental', 'inplace_sync') 231 ): 232 with Venv(get_connector_plugin(self.instance_connector)): 233 p._invalidate_cache(debug=debug) 234 _args, _kwargs = filter_arguments( 235 p.instance_connector.sync_pipe_inplace, 236 p, 237 debug=debug, 238 **kw 239 ) 240 return self.instance_connector.sync_pipe_inplace( 241 *_args, 242 **_kwargs 243 ) 244 245 ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods. 246 try: 247 if getattr(p.connector, 'sync', None) is not None: 248 with Venv(get_connector_plugin(p.connector), debug=debug): 249 _args, _kwargs = filter_arguments( 250 p.connector.sync, 251 p, 252 debug=debug, 253 **kw 254 ) 255 return_tuple = p.connector.sync(*_args, **_kwargs) 256 p._invalidate_cache(debug=debug) 257 if not isinstance(return_tuple, tuple): 258 return_tuple = ( 259 False, 260 f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}" 261 ) 262 return return_tuple 263 264 except Exception as e: 265 get_console().print_exception() 266 msg = f"Failed to sync {p} with exception: '" + str(e) + "'" 267 if debug: 268 error(msg, silent=False) 269 p._invalidate_cache(debug=debug) 270 return False, msg 271 272 ### Fetch the dataframe from the connector's `fetch()` method. 273 try: 274 with Venv(get_connector_plugin(p.connector), debug=debug): 275 df = p.fetch( 276 **filter_keywords( 277 p.fetch, 278 debug=debug, 279 **kw 280 ) 281 ) 282 kw['safe_copy'] = False 283 except Exception as e: 284 get_console().print_exception( 285 suppress=[ 286 'meerschaum/core/Pipe/_sync.py', 287 'meerschaum/core/Pipe/_fetch.py', 288 ] 289 ) 290 msg = f"Failed to fetch data from {p.connector}:\n {e}" 291 df = None 292 293 if df is None: 294 p._invalidate_cache(debug=debug) 295 return False, f"No data were fetched for {p}." 296 297 if isinstance(df, list): 298 if len(df) == 0: 299 return True, f"No new rows were returned for {p}." 300 301 ### May be a chunk hook results list. 302 if isinstance(df[0], tuple): 303 success = all([_success for _success, _ in df]) 304 message = '\n'.join([_message for _, _message in df]) 305 return success, message 306 307 if df is True: 308 p._invalidate_cache(debug=debug) 309 return True, f"{p} is being synced in parallel." 310 311 ### CHECKPOINT: Retrieved the DataFrame. 312 _checkpoint(**kw) 313 314 ### Allow for dataframe generators or iterables. 315 if df_is_chunk_generator(df): 316 kw['workers'] = p.get_num_workers(kw.get('workers', None)) 317 dt_col = p.columns.get('datetime', None) 318 pool = get_pool(workers=kw.get('workers', 1)) 319 if debug: 320 dprint(f"Received {type(df)}. Attempting to sync first chunk...") 321 322 try: 323 chunk = next(df) 324 except StopIteration: 325 return True, "Received an empty generator; nothing to do." 326 327 chunk_success, chunk_msg = _sync(p, chunk) 328 chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg 329 if not chunk_success: 330 return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}" 331 if debug: 332 dprint("Successfully synced the first chunk, attemping the rest...") 333 334 def _process_chunk(_chunk): 335 _chunk_attempts = 0 336 _max_chunk_attempts = 3 337 while _chunk_attempts < _max_chunk_attempts: 338 try: 339 _chunk_success, _chunk_msg = _sync(p, _chunk) 340 except Exception as e: 341 _chunk_success, _chunk_msg = False, str(e) 342 if _chunk_success: 343 break 344 _chunk_attempts += 1 345 _sleep_seconds = _chunk_attempts ** 2 346 warn( 347 ( 348 f"Failed to sync chunk to {self} " 349 + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n" 350 + f"Sleeping for {_sleep_seconds} second" 351 + ('s' if _sleep_seconds != 1 else '') 352 + f":\n{_chunk_msg}" 353 ), 354 stack=False, 355 ) 356 time.sleep(_sleep_seconds) 357 358 num_rows_str = ( 359 f"{num_rows:,} rows" 360 if (num_rows := len(_chunk)) != 1 361 else f"{num_rows} row" 362 ) 363 _chunk_msg = ( 364 ( 365 "Synced" 366 if _chunk_success 367 else "Failed to sync" 368 ) + f" a chunk ({num_rows_str}) to {p}:\n" 369 + self._get_chunk_label(_chunk, dt_col) 370 + '\n' 371 + _chunk_msg 372 ) 373 374 mrsm.pprint((_chunk_success, _chunk_msg), calm=True) 375 return _chunk_success, _chunk_msg 376 377 results = sorted( 378 [(chunk_success, chunk_msg)] + ( 379 list(pool.imap(_process_chunk, df)) 380 if ( 381 not df_is_chunk_generator(chunk) # Handle nested generators. 382 and kw.get('workers', 1) != 1 383 ) 384 else list( 385 _process_chunk(_child_chunks) 386 for _child_chunks in df 387 ) 388 ) 389 ) 390 chunk_messages = [chunk_msg for _, chunk_msg in results] 391 success_bools = [chunk_success for chunk_success, _ in results] 392 num_successes = len([chunk_success for chunk_success, _ in results if chunk_success]) 393 num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success]) 394 success = all(success_bools) 395 msg = ( 396 'Synced ' 397 + f'{len(chunk_messages):,} chunk' 398 + ('s' if len(chunk_messages) != 1 else '') 399 + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n' 400 + '\n\n'.join(chunk_messages).lstrip().rstrip() 401 ).lstrip().rstrip() 402 return success, msg 403 404 ### Cast to a dataframe and ensure datatypes are what we expect. 405 dtypes = p.get_dtypes(debug=debug) 406 df = p.enforce_dtypes( 407 df, 408 chunksize=chunksize, 409 enforce=enforce_dtypes, 410 dtypes=dtypes, 411 debug=debug, 412 ) 413 if p.autotime: 414 dt_col = p.columns.get('datetime', None) 415 ts_col = dt_col or mrsm.get_config( 416 'pipes', 'autotime', 'column_name_if_datetime_missing' 417 ) 418 ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime' 419 if ts_col and hasattr(df, 'columns') and ts_col not in df.columns: 420 precision = p.get_precision(debug=debug) 421 now = get_current_timestamp( 422 precision_unit=precision.get( 423 'unit', 424 STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 425 ), 426 precision_interval=precision.get('interval', 1), 427 round_to=(precision.get('round_to', 'down')), 428 as_int=(are_dtypes_equal(ts_typ, 'int')), 429 ) 430 if debug: 431 dprint(f"Adding current timestamp to dataframe synced to {p}: {now}") 432 433 df[ts_col] = now 434 kw['check_existing'] = dt_col is not None 435 436 ### Capture special columns. 437 capture_success, capture_msg = self._persist_new_special_columns( 438 df, 439 dtypes=dtypes, 440 debug=debug, 441 ) 442 if not capture_success: 443 warn(f"Failed to capture new special columns for {self}:\n{capture_msg}") 444 445 if debug: 446 dprint( 447 "DataFrame to sync:\n" 448 + ( 449 str(df)[:255] 450 + '...' 451 if len(str(df)) >= 256 452 else str(df) 453 ), 454 **kw 455 ) 456 457 ### if force, continue to sync until success 458 return_tuple = False, f"Did not sync {p}." 459 run = True 460 _retries = 1 461 while run: 462 with Venv(get_connector_plugin(self.instance_connector)): 463 return_tuple = p.instance_connector.sync_pipe( 464 pipe=p, 465 df=df, 466 debug=debug, 467 **kw 468 ) 469 _retries += 1 470 run = (not return_tuple[0]) and force and _retries <= retries 471 if run and debug: 472 dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw) 473 dprint(f"Sleeping for {min_seconds} seconds...", **kw) 474 time.sleep(min_seconds) 475 if _retries > retries: 476 warn( 477 f"Unable to sync {p} within {retries} attempt" + 478 ("s" if retries != 1 else "") + "!" 479 ) 480 481 ### CHECKPOINT: Finished syncing. 482 _checkpoint(**kw) 483 p._invalidate_cache(debug=debug) 484 return return_tuple 485 486 if blocking: 487 return _sync(self, df=df) 488 489 from meerschaum.utils.threading import Thread 490 def default_callback(result_tuple: SuccessTuple): 491 dprint(f"Asynchronous result from {self}: {result_tuple}", **kw) 492 493 def default_error_callback(x: Exception): 494 dprint(f"Error received for {self}: {x}", **kw) 495 496 if callback is None and debug: 497 callback = default_callback 498 if error_callback is None and debug: 499 error_callback = default_error_callback 500 try: 501 thread = Thread( 502 target=_sync, 503 args=(self,), 504 kwargs={'df': df}, 505 daemon=False, 506 callback=callback, 507 error_callback=error_callback, 508 ) 509 thread.start() 510 except Exception as e: 511 self._invalidate_cache(debug=debug) 512 return False, str(e) 513 514 self._invalidate_cache(debug=debug) 515 return True, f"Spawned asyncronous sync for {self}."
Fetch new data from the source and update the pipe's table with new data.
Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.
Parameters
- df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None):
An optional DataFrame to sync into the pipe. Defaults to
None. Ifdfis a string, it will be parsed viameerschaum.utils.dataframe.parse_simple_lines(). - begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
- end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
- force (bool, default False):
If
True, keep trying to sync untulretriesattempts. - retries (int, default 10):
If
force, how many attempts to try syncing before declaring failure. - min_seconds (Union[int, float], default 1):
If
force, how many seconds to sleep between retries. Defaults to1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. - enforce_dtypes (bool, default True):
If
True, enforce dtypes on incoming data. Set this toFalseif the incoming rows are expected to be of the correct dtypes. - blocking (bool, default True):
If
True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults toTrue. Only intended for specific scenarios. - workers (Optional[int], default None):
If provided and the instance connector is thread-safe
(
pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads. - callback (Optional[Callable[[Tuple[bool, str]], Any]], default None):
Callback function which expects a SuccessTuple as input.
Only applies when
blocking=False. - error_callback (Optional[Callable[[Exception], Any]], default None):
Callback function which expects an Exception as input.
Only applies when
blocking=False. - chunksize (int, default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. - sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
- debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
518def get_sync_time( 519 self, 520 params: Optional[Dict[str, Any]] = None, 521 newest: bool = True, 522 apply_backtrack_interval: bool = False, 523 remote: bool = False, 524 round_down: bool = False, 525 debug: bool = False 526) -> Union['datetime', int, None]: 527 """ 528 Get the most recent datetime value for a Pipe. 529 530 Parameters 531 ---------- 532 params: Optional[Dict[str, Any]], default None 533 Dictionary to build a WHERE clause for a specific column. 534 See `meerschaum.utils.sql.build_where`. 535 536 newest: bool, default True 537 If `True`, get the most recent datetime (honoring `params`). 538 If `False`, get the oldest datetime (`ASC` instead of `DESC`). 539 540 apply_backtrack_interval: bool, default False 541 If `True`, subtract the backtrack interval from the sync time. 542 543 remote: bool, default False 544 If `True` and the instance connector supports it, return the sync time 545 for the remote table definition. 546 547 round_down: bool, default False 548 If `True`, round down the datetime value to the nearest minute. 549 550 debug: bool, default False 551 Verbosity toggle. 552 553 Returns 554 ------- 555 A `datetime` or int, if the pipe exists, otherwise `None`. 556 557 """ 558 from meerschaum.utils.venv import Venv 559 from meerschaum.connectors import get_connector_plugin 560 from meerschaum.utils.misc import filter_keywords 561 from meerschaum.utils.dtypes import round_time 562 from meerschaum.utils.warnings import warn 563 564 if not self.columns.get('datetime', None): 565 return None 566 567 connector = self.instance_connector if not remote else self.connector 568 if isinstance(connector, str) or connector is None: 569 return None 570 571 with Venv(get_connector_plugin(connector)): 572 if not hasattr(connector, 'get_sync_time'): 573 warn( 574 f"Connectors of type '{connector.type}' " 575 "do not implement `get_sync_time().", 576 stack=False, 577 ) 578 return None 579 sync_time = connector.get_sync_time( 580 self, 581 **filter_keywords( 582 connector.get_sync_time, 583 params=params, 584 newest=newest, 585 remote=remote, 586 debug=debug, 587 ) 588 ) 589 590 if round_down and isinstance(sync_time, datetime): 591 sync_time = round_time(sync_time, timedelta(minutes=1)) 592 593 if apply_backtrack_interval and sync_time is not None: 594 backtrack_interval = self.get_backtrack_interval(debug=debug) 595 try: 596 sync_time -= backtrack_interval 597 except Exception as e: 598 warn(f"Failed to apply backtrack interval:\n{e}") 599 600 return self.parse_date_bounds(sync_time)
Get the most recent datetime value for a Pipe.
Parameters
- params (Optional[Dict[str, Any]], default None):
Dictionary to build a WHERE clause for a specific column.
See
meerschaum.utils.sql.build_where. - newest (bool, default True):
If
True, get the most recent datetime (honoringparams). IfFalse, get the oldest datetime (ASCinstead ofDESC). - apply_backtrack_interval (bool, default False):
If
True, subtract the backtrack interval from the sync time. - remote (bool, default False):
If
Trueand the instance connector supports it, return the sync time for the remote table definition. - round_down (bool, default False):
If
True, round down the datetime value to the nearest minute. - debug (bool, default False): Verbosity toggle.
Returns
- A
datetimeor int, if the pipe exists, otherwiseNone.
603def exists( 604 self, 605 debug: bool = False 606) -> bool: 607 """ 608 See if a Pipe's table exists. 609 610 Parameters 611 ---------- 612 debug: bool, default False 613 Verbosity toggle. 614 615 Returns 616 ------- 617 A `bool` corresponding to whether a pipe's underlying table exists. 618 619 """ 620 from meerschaum.utils.venv import Venv 621 from meerschaum.connectors import get_connector_plugin 622 from meerschaum.utils.debug import dprint 623 from meerschaum.utils.dtypes import get_current_timestamp 624 now = get_current_timestamp('ms', as_int=True) / 1000 625 cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds') 626 627 _exists = self._get_cached_value('_exists', debug=debug) 628 if _exists: 629 exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug) 630 if exists_timestamp is not None: 631 delta = now - exists_timestamp 632 if delta < cache_seconds: 633 if debug: 634 dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).") 635 return _exists 636 637 with Venv(get_connector_plugin(self.instance_connector)): 638 _exists = ( 639 self.instance_connector.pipe_exists(pipe=self, debug=debug) 640 if hasattr(self.instance_connector, 'pipe_exists') 641 else False 642 ) 643 644 self._cache_value('_exists', _exists, debug=debug) 645 self._cache_value('_exists_timestamp', now, debug=debug) 646 return _exists
See if a Pipe's table exists.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
boolcorresponding to whether a pipe's underlying table exists.
649def filter_existing( 650 self, 651 df: 'pd.DataFrame', 652 safe_copy: bool = True, 653 date_bound_only: bool = False, 654 include_unchanged_columns: bool = False, 655 enforce_dtypes: bool = False, 656 chunksize: Optional[int] = -1, 657 debug: bool = False, 658 **kw 659) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']: 660 """ 661 Inspect a dataframe and filter out rows which already exist in the pipe. 662 663 Parameters 664 ---------- 665 df: 'pd.DataFrame' 666 The dataframe to inspect and filter. 667 668 safe_copy: bool, default True 669 If `True`, create a copy before comparing and modifying the dataframes. 670 Setting to `False` may mutate the DataFrames. 671 See `meerschaum.utils.dataframe.filter_unseen_df`. 672 673 date_bound_only: bool, default False 674 If `True`, only use the datetime index to fetch the sample dataframe. 675 676 include_unchanged_columns: bool, default False 677 If `True`, include the backtrack columns which haven't changed in the update dataframe. 678 This is useful if you can't update individual keys. 679 680 enforce_dtypes: bool, default False 681 If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes. 682 Setting `enforce_dtypes=True` may impact performance. 683 684 chunksize: Optional[int], default -1 685 The `chunksize` used when fetching existing data. 686 687 debug: bool, default False 688 Verbosity toggle. 689 690 Returns 691 ------- 692 A tuple of three pandas DataFrames: unseen, update, and delta. 693 """ 694 from meerschaum.utils.warnings import warn 695 from meerschaum.utils.debug import dprint 696 from meerschaum.utils.packages import attempt_import, import_pandas 697 from meerschaum.utils.dataframe import ( 698 filter_unseen_df, 699 add_missing_cols_to_df, 700 get_unhashable_cols, 701 ) 702 from meerschaum.utils.dtypes import ( 703 to_pandas_dtype, 704 none_if_null, 705 to_datetime, 706 are_dtypes_equal, 707 value_is_null, 708 round_time, 709 ) 710 from meerschaum.config import get_config 711 pd = import_pandas() 712 pandas = attempt_import('pandas') 713 if enforce_dtypes or 'dataframe' not in str(type(df)).lower(): 714 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 715 is_dask = hasattr('df', '__module__') and 'dask' in df.__module__ 716 if is_dask: 717 dd = attempt_import('dask.dataframe') 718 merge = dd.merge 719 NA = pandas.NA 720 else: 721 merge = pd.merge 722 NA = pd.NA 723 724 parameters = self.parameters 725 pipe_columns = self.columns 726 primary_key = pipe_columns.get('primary', None) 727 dt_col = pipe_columns.get('datetime', None) 728 dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None 729 autoincrement = parameters.get('autoincrement', False) 730 autotime = parameters.get('autotime', False) 731 732 if primary_key and autoincrement and df is not None and primary_key in df.columns: 733 if safe_copy: 734 df = df.copy() 735 safe_copy = False 736 if df[primary_key].isnull().all(): 737 del df[primary_key] 738 _ = self.columns.pop(primary_key, None) 739 740 if dt_col and autotime and df is not None and dt_col in df.columns: 741 if safe_copy: 742 df = df.copy() 743 safe_copy = False 744 if df[dt_col].isnull().all(): 745 del df[dt_col] 746 _ = self.columns.pop(dt_col, None) 747 748 def get_empty_df(): 749 empty_df = pd.DataFrame([]) 750 dtypes = dict(df.dtypes) if df is not None else {} 751 dtypes.update(self.dtypes) if self.enforce else {} 752 pd_dtypes = { 753 col: to_pandas_dtype(str(typ)) 754 for col, typ in dtypes.items() 755 } 756 return add_missing_cols_to_df(empty_df, pd_dtypes) 757 758 if df is None: 759 empty_df = get_empty_df() 760 return empty_df, empty_df, empty_df 761 762 if (df.empty if not is_dask else len(df) == 0): 763 return df, df, df 764 765 ### begin is the oldest data in the new dataframe 766 begin, end = None, None 767 768 if autoincrement and primary_key == dt_col and dt_col not in df.columns: 769 if enforce_dtypes: 770 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 771 return df, get_empty_df(), df 772 773 if autotime and dt_col and dt_col not in df.columns: 774 if enforce_dtypes: 775 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 776 return df, get_empty_df(), df 777 778 try: 779 min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None 780 if is_dask and min_dt_val is not None: 781 min_dt_val = min_dt_val.compute() 782 min_dt = ( 783 to_datetime(min_dt_val, as_pydatetime=True) 784 if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime') 785 else min_dt_val 786 ) 787 except Exception: 788 min_dt = None 789 790 if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt): 791 if not are_dtypes_equal('int', str(type(min_dt))): 792 min_dt = None 793 794 if isinstance(min_dt, datetime): 795 rounded_min_dt = round_time(min_dt, to='down') 796 try: 797 begin = rounded_min_dt - timedelta(minutes=1) 798 except OverflowError: 799 begin = rounded_min_dt 800 elif dt_type and 'int' in dt_type.lower(): 801 begin = min_dt 802 elif dt_col is None: 803 begin = None 804 805 ### end is the newest data in the new dataframe 806 try: 807 max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None 808 if is_dask and max_dt_val is not None: 809 max_dt_val = max_dt_val.compute() 810 max_dt = ( 811 to_datetime(max_dt_val, as_pydatetime=True) 812 if max_dt_val is not None and 'datetime' in str(dt_type) 813 else max_dt_val 814 ) 815 except Exception: 816 import traceback 817 traceback.print_exc() 818 max_dt = None 819 820 if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt): 821 if not are_dtypes_equal('int', str(type(max_dt))): 822 max_dt = None 823 824 if isinstance(max_dt, datetime): 825 end = ( 826 round_time( 827 max_dt, 828 to='down' 829 ) + timedelta(minutes=1) 830 ) 831 elif dt_type and 'int' in dt_type.lower() and max_dt is not None: 832 end = max_dt + 1 833 834 if max_dt is not None and min_dt is not None and min_dt > max_dt: 835 warn("Detected minimum datetime greater than maximum datetime.") 836 837 if begin is not None and end is not None and begin > end: 838 if isinstance(begin, datetime): 839 begin = end - timedelta(minutes=1) 840 ### We might be using integers for the datetime axis. 841 else: 842 begin = end - 1 843 844 unique_index_vals = { 845 col: df[col].unique() 846 for col in (pipe_columns.values() if not primary_key else [primary_key]) 847 if col in df.columns and col != dt_col 848 } if not date_bound_only else {} 849 unique_index_lens = { 850 col: len(unique_vals) 851 for col, unique_vals in unique_index_vals.items() 852 } if not date_bound_only else {} 853 filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit') 854 _ = kw.pop('params', None) 855 params = { 856 col: [ 857 none_if_null(val) 858 for val in unique_vals 859 ] 860 for col, unique_vals in unique_index_vals.items() 861 if unique_index_lens[col] <= filter_params_index_limit 862 } if not date_bound_only else {} 863 864 if debug: 865 dprint( 866 ( 867 f"Looking at data between '{begin}' and '{end}' with index value lengths:\n" 868 f"{json.dumps(unique_index_lens, indent=4)}\n" 869 ), 870 **kw 871 ) 872 873 backtrack_df = self.get_data( 874 begin=begin, 875 end=end, 876 chunksize=chunksize, 877 params=params, 878 debug=debug, 879 **kw 880 ) 881 if backtrack_df is None: 882 if debug: 883 dprint(f"No backtrack data was found for {self}.") 884 return df, get_empty_df(), df 885 886 if enforce_dtypes: 887 backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug) 888 889 if debug: 890 dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw) 891 dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes)) 892 893 ### Separate new rows from changed ones. 894 on_cols = [ 895 col 896 for col_key, col in pipe_columns.items() 897 if ( 898 col 899 and 900 col_key != 'value' 901 and col in backtrack_df.columns 902 ) 903 ] if not primary_key else [primary_key] 904 905 self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {} 906 on_cols_dtypes = { 907 col: to_pandas_dtype(typ) 908 for col, typ in self_dtypes.items() 909 if col in on_cols 910 } 911 912 ### Detect changes between the old target and new source dataframes. 913 delta_df = add_missing_cols_to_df( 914 filter_unseen_df( 915 backtrack_df, 916 df, 917 dtypes={ 918 col: to_pandas_dtype(typ) 919 for col, typ in self_dtypes.items() 920 }, 921 safe_copy=safe_copy, 922 coerce_mixed_numerics=(not self.static), 923 debug=debug 924 ), 925 on_cols_dtypes, 926 ) 927 if enforce_dtypes: 928 delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug) 929 930 ### Cast dicts or lists to strings so we can merge. 931 serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str) 932 933 def deserializer(x): 934 return json.loads(x) if isinstance(x, str) else x 935 936 unhashable_delta_cols = get_unhashable_cols(delta_df) 937 unhashable_backtrack_cols = get_unhashable_cols(backtrack_df) 938 for col in unhashable_delta_cols: 939 delta_df[col] = delta_df[col].apply(serializer) 940 for col in unhashable_backtrack_cols: 941 backtrack_df[col] = backtrack_df[col].apply(serializer) 942 casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols) 943 944 joined_df = merge( 945 delta_df.infer_objects().fillna(NA), 946 backtrack_df.infer_objects().fillna(NA), 947 how='left', 948 on=on_cols, 949 indicator=True, 950 suffixes=('', '_old'), 951 ) if on_cols else delta_df 952 for col in casted_cols: 953 if col in joined_df.columns: 954 joined_df[col] = joined_df[col].apply(deserializer) 955 if col in delta_df.columns: 956 delta_df[col] = delta_df[col].apply(deserializer) 957 958 ### Determine which rows are completely new. 959 new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None 960 cols = list(delta_df.columns) 961 962 unseen_df = ( 963 joined_df 964 .where(new_rows_mask) 965 .dropna(how='all')[cols] 966 .reset_index(drop=True) 967 ) if on_cols else delta_df 968 969 ### Rows that have already been inserted but values have changed. 970 update_df = ( 971 joined_df 972 .where(~new_rows_mask) 973 .dropna(how='all')[cols] 974 .reset_index(drop=True) 975 ) if on_cols else get_empty_df() 976 977 if include_unchanged_columns and on_cols: 978 unchanged_backtrack_cols = [ 979 col 980 for col in backtrack_df.columns 981 if col in on_cols or col not in update_df.columns 982 ] 983 if enforce_dtypes: 984 update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug) 985 update_df = merge( 986 backtrack_df[unchanged_backtrack_cols], 987 update_df, 988 how='inner', 989 on=on_cols, 990 ) 991 992 return unseen_df, update_df, delta_df
Inspect a dataframe and filter out rows which already exist in the pipe.
Parameters
- df ('pd.DataFrame'): The dataframe to inspect and filter.
- safe_copy (bool, default True):
If
True, create a copy before comparing and modifying the dataframes. Setting toFalsemay mutate the DataFrames. Seemeerschaum.utils.dataframe.filter_unseen_df. - date_bound_only (bool, default False):
If
True, only use the datetime index to fetch the sample dataframe. - include_unchanged_columns (bool, default False):
If
True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys. - enforce_dtypes (bool, default False):
If
True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Settingenforce_dtypes=Truemay impact performance. - chunksize (Optional[int], default -1):
The
chunksizeused when fetching existing data. - debug (bool, default False): Verbosity toggle.
Returns
- A tuple of three pandas DataFrames (unseen, update, and delta.):
1017def get_num_workers(self, workers: Optional[int] = None) -> int: 1018 """ 1019 Get the number of workers to use for concurrent syncs. 1020 1021 Parameters 1022 ---------- 1023 The number of workers passed via `--workers`. 1024 1025 Returns 1026 ------- 1027 The number of workers, capped for safety. 1028 """ 1029 is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False) 1030 if not is_thread_safe: 1031 return 1 1032 1033 engine_pool_size = ( 1034 self.instance_connector.engine.pool.size() 1035 if self.instance_connector.type == 'sql' 1036 else None 1037 ) 1038 current_num_threads = threading.active_count() 1039 current_num_connections = ( 1040 self.instance_connector.engine.pool.checkedout() 1041 if engine_pool_size is not None 1042 else current_num_threads 1043 ) 1044 desired_workers = ( 1045 min(workers or engine_pool_size, engine_pool_size) 1046 if engine_pool_size is not None 1047 else workers 1048 ) 1049 if desired_workers is None: 1050 desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1) 1051 1052 return max( 1053 (desired_workers - current_num_connections), 1054 1, 1055 )
Get the number of workers to use for concurrent syncs.
Parameters
- The number of workers passed via
--workers.
Returns
- The number of workers, capped for safety.
19def verify( 20 self, 21 begin: Union[datetime, int, None] = None, 22 end: Union[datetime, int, None] = None, 23 params: Optional[Dict[str, Any]] = None, 24 chunk_interval: Union[timedelta, int, None] = None, 25 bounded: Optional[bool] = None, 26 deduplicate: bool = False, 27 workers: Optional[int] = None, 28 batchsize: Optional[int] = None, 29 skip_chunks_with_greater_rowcounts: bool = False, 30 check_rowcounts_only: bool = False, 31 debug: bool = False, 32 **kwargs: Any 33) -> SuccessTuple: 34 """ 35 Verify the contents of the pipe by resyncing its interval. 36 37 Parameters 38 ---------- 39 begin: Union[datetime, int, None], default None 40 If specified, only verify rows greater than or equal to this value. 41 42 end: Union[datetime, int, None], default None 43 If specified, only verify rows less than this value. 44 45 chunk_interval: Union[timedelta, int, None], default None 46 If provided, use this as the size of the chunk boundaries. 47 Default to the value set in `pipe.parameters['chunk_minutes']` (1440). 48 49 bounded: Optional[bool], default None 50 If `True`, do not verify older than the oldest sync time or newer than the newest. 51 If `False`, verify unbounded syncs outside of the new and old sync times. 52 The default behavior (`None`) is to bound only if a bound interval is set 53 (e.g. `pipe.parameters['verify']['bound_days']`). 54 55 deduplicate: bool, default False 56 If `True`, deduplicate the pipe's table after the verification syncs. 57 58 workers: Optional[int], default None 59 If provided, limit the verification to this many threads. 60 Use a value of `1` to sync chunks in series. 61 62 batchsize: Optional[int], default None 63 If provided, sync this many chunks in parallel. 64 Defaults to `Pipe.get_num_workers()`. 65 66 skip_chunks_with_greater_rowcounts: bool, default False 67 If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's 68 chunk rowcount equals or exceeds the remote's rowcount. 69 70 check_rowcounts_only: bool, default False 71 If `True`, only compare rowcounts and print chunks which are out-of-sync. 72 73 debug: bool, default False 74 Verbosity toggle. 75 76 kwargs: Any 77 All keyword arguments are passed to `pipe.sync()`. 78 79 Returns 80 ------- 81 A SuccessTuple indicating whether the pipe was successfully resynced. 82 """ 83 from meerschaum.utils.pool import get_pool 84 from meerschaum.utils.formatting import make_header 85 from meerschaum.utils.misc import interval_str 86 workers = self.get_num_workers(workers) 87 check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only 88 89 ### Skip configured bounding in parameters 90 ### if `bounded` is explicitly `False`. 91 bound_time = ( 92 self.get_bound_time(debug=debug) 93 if bounded is not False 94 else None 95 ) 96 if bounded is None: 97 bounded = bound_time is not None 98 99 if bounded and begin is None: 100 begin = ( 101 bound_time 102 if bound_time is not None 103 else self.get_sync_time(newest=False, debug=debug) 104 ) 105 if begin is None: 106 remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug) 107 begin = remote_oldest_sync_time 108 if bounded and end is None: 109 end = self.get_sync_time(newest=True, debug=debug) 110 if end is None: 111 remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug) 112 end = remote_newest_sync_time 113 if end is not None: 114 end += ( 115 timedelta(minutes=1) 116 if hasattr(end, 'tzinfo') 117 else 1 118 ) 119 120 begin, end = self.parse_date_bounds(begin, end) 121 cannot_determine_bounds = bounded and begin is None and end is None 122 123 if cannot_determine_bounds and not check_rowcounts_only: 124 warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False) 125 sync_success, sync_msg = self.sync( 126 begin=begin, 127 end=end, 128 params=params, 129 workers=workers, 130 debug=debug, 131 **kwargs 132 ) 133 if not sync_success: 134 return sync_success, sync_msg 135 136 if deduplicate: 137 return self.deduplicate( 138 begin=begin, 139 end=end, 140 params=params, 141 workers=workers, 142 debug=debug, 143 **kwargs 144 ) 145 return sync_success, sync_msg 146 147 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 148 chunk_bounds = self.get_chunk_bounds( 149 begin=begin, 150 end=end, 151 chunk_interval=chunk_interval, 152 bounded=bounded, 153 debug=debug, 154 ) 155 156 ### Consider it a success if no chunks need to be verified. 157 if not chunk_bounds: 158 if deduplicate: 159 return self.deduplicate( 160 begin=begin, 161 end=end, 162 params=params, 163 workers=workers, 164 debug=debug, 165 **kwargs 166 ) 167 return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do." 168 169 begin_to_print = ( 170 begin 171 if begin is not None 172 else ( 173 chunk_bounds[0][0] 174 if bounded 175 else chunk_bounds[0][1] 176 ) 177 ) 178 end_to_print = ( 179 end 180 if end is not None 181 else ( 182 chunk_bounds[-1][1] 183 if bounded 184 else chunk_bounds[-1][0] 185 ) 186 ) 187 message_header = f"{begin_to_print} - {end_to_print}" 188 max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs') 189 190 info( 191 f"Verifying {self}:\n " 192 + ("Syncing" if not check_rowcounts_only else "Checking") 193 + f" {len(chunk_bounds)} chunk" 194 + ('s' if len(chunk_bounds) != 1 else '') 195 + f" ({'un' if not bounded else ''}bounded)" 196 + f" of size '{interval_str(chunk_interval)}'" 197 + f" between '{begin_to_print}' and '{end_to_print}'.\n" 198 ) 199 200 ### Dictionary of the form bounds -> success_tuple, e.g.: 201 ### { 202 ### (2023-01-01, 2023-01-02): (True, "Success") 203 ### } 204 bounds_success_tuples = {} 205 def process_chunk_bounds( 206 chunk_begin_and_end: Tuple[ 207 Union[int, datetime], 208 Union[int, datetime] 209 ], 210 _workers: Optional[int] = 1, 211 ): 212 if chunk_begin_and_end in bounds_success_tuples: 213 return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end] 214 215 chunk_begin, chunk_end = chunk_begin_and_end 216 do_sync = True 217 chunk_success, chunk_msg = False, "Did not sync chunk." 218 if check_rowcounts: 219 existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug) 220 remote_rowcount = self.get_rowcount( 221 begin=chunk_begin, 222 end=chunk_end, 223 remote=True, 224 debug=debug, 225 ) 226 checked_rows_str = ( 227 f"checked {existing_rowcount:,} row" 228 + ("s" if existing_rowcount != 1 else '') 229 + f" vs {remote_rowcount:,} remote" 230 ) 231 if ( 232 existing_rowcount is not None 233 and remote_rowcount is not None 234 and existing_rowcount >= remote_rowcount 235 ): 236 do_sync = False 237 chunk_success, chunk_msg = True, ( 238 "Row-count is up-to-date " 239 f"({checked_rows_str})." 240 ) 241 elif check_rowcounts_only: 242 do_sync = False 243 chunk_success, chunk_msg = True, ( 244 f"Row-counts are out-of-sync ({checked_rows_str})." 245 ) 246 247 num_syncs = 0 248 while num_syncs < max_chunks_syncs: 249 chunk_success, chunk_msg = self.sync( 250 begin=chunk_begin, 251 end=chunk_end, 252 params=params, 253 workers=_workers, 254 debug=debug, 255 **kwargs 256 ) if do_sync else (chunk_success, chunk_msg) 257 if chunk_success: 258 break 259 num_syncs += 1 260 time.sleep(num_syncs**2) 261 chunk_msg = chunk_msg.strip() 262 if ' - ' not in chunk_msg: 263 chunk_label = f"{chunk_begin} - {chunk_end}" 264 chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}' 265 mrsm.pprint((chunk_success, chunk_msg)) 266 267 return chunk_begin_and_end, (chunk_success, chunk_msg) 268 269 ### If we have more than one chunk, attempt to sync the first one and return if its fails. 270 if len(chunk_bounds) > 1: 271 first_chunk_bounds = chunk_bounds[0] 272 first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}" 273 info(f"Verifying first chunk for {self}:\n {first_label}") 274 ( 275 (first_begin, first_end), 276 (first_success, first_msg) 277 ) = process_chunk_bounds(first_chunk_bounds, _workers=workers) 278 if not first_success: 279 return ( 280 first_success, 281 f"\n{first_label}\n" 282 + f"Failed to sync first chunk:\n{first_msg}" 283 ) 284 bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg) 285 info(f"Completed first chunk for {self}:\n {first_label}\n") 286 chunk_bounds = chunk_bounds[1:] 287 288 pool = get_pool(workers=workers) 289 batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers) 290 291 def process_batch( 292 batch_chunk_bounds: Tuple[ 293 Tuple[Union[datetime, int, None], Union[datetime, int, None]], 294 ... 295 ] 296 ): 297 _batch_begin = batch_chunk_bounds[0][0] 298 _batch_end = batch_chunk_bounds[-1][-1] 299 batch_message_header = f"{_batch_begin} - {_batch_end}" 300 301 if check_rowcounts_only: 302 info(f"Checking row-counts for batch bounds:\n {batch_message_header}") 303 _, (batch_init_success, batch_init_msg) = process_chunk_bounds( 304 (_batch_begin, _batch_end) 305 ) 306 mrsm.pprint((batch_init_success, batch_init_msg)) 307 if batch_init_success and 'up-to-date' in batch_init_msg: 308 info("Entire batch is up-to-date.") 309 return batch_init_success, batch_init_msg 310 311 batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds)) 312 bounds_success_tuples.update(batch_bounds_success_tuples) 313 batch_bounds_success_bools = { 314 bounds: tup[0] 315 for bounds, tup in batch_bounds_success_tuples.items() 316 } 317 318 if all(batch_bounds_success_bools.values()): 319 msg = get_chunks_success_message( 320 batch_bounds_success_tuples, 321 header=batch_message_header, 322 check_rowcounts_only=check_rowcounts_only, 323 ) 324 if deduplicate: 325 deduplicate_success, deduplicate_msg = self.deduplicate( 326 begin=_batch_begin, 327 end=_batch_end, 328 params=params, 329 workers=workers, 330 debug=debug, 331 **kwargs 332 ) 333 return deduplicate_success, msg + '\n\n' + deduplicate_msg 334 return True, msg 335 336 batch_chunk_bounds_to_resync = [ 337 bounds 338 for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools) 339 if not success 340 ] 341 batch_bounds_to_print = [ 342 f"{bounds[0]} - {bounds[1]}" 343 for bounds in batch_chunk_bounds_to_resync 344 ] 345 if batch_bounds_to_print: 346 warn( 347 "Will resync the following failed chunks:\n " 348 + '\n '.join(batch_bounds_to_print), 349 stack=False, 350 ) 351 352 retry_bounds_success_tuples = dict(pool.map( 353 process_chunk_bounds, 354 batch_chunk_bounds_to_resync 355 )) 356 batch_bounds_success_tuples.update(retry_bounds_success_tuples) 357 bounds_success_tuples.update(retry_bounds_success_tuples) 358 retry_bounds_success_bools = { 359 bounds: tup[0] 360 for bounds, tup in retry_bounds_success_tuples.items() 361 } 362 363 if all(retry_bounds_success_bools.values()): 364 chunks_message = ( 365 get_chunks_success_message( 366 batch_bounds_success_tuples, 367 header=batch_message_header, 368 check_rowcounts_only=check_rowcounts_only, 369 ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + ( 370 's' 371 if len(batch_chunk_bounds_to_resync) != 1 372 else '' 373 ) + "." 374 ) 375 if deduplicate: 376 deduplicate_success, deduplicate_msg = self.deduplicate( 377 begin=_batch_begin, 378 end=_batch_end, 379 params=params, 380 workers=workers, 381 debug=debug, 382 **kwargs 383 ) 384 return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg 385 return True, chunks_message 386 387 batch_chunks_message = get_chunks_success_message( 388 batch_bounds_success_tuples, 389 header=batch_message_header, 390 check_rowcounts_only=check_rowcounts_only, 391 ) 392 if deduplicate: 393 deduplicate_success, deduplicate_msg = self.deduplicate( 394 begin=begin, 395 end=end, 396 params=params, 397 workers=workers, 398 debug=debug, 399 **kwargs 400 ) 401 return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg 402 return False, batch_chunks_message 403 404 num_batches = len(batches) 405 for batch_i, batch in enumerate(batches): 406 batch_begin = batch[0][0] 407 batch_end = batch[-1][-1] 408 batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})" 409 batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}" 410 retry_failed_batch = True 411 try: 412 for_self = 'for ' + str(self) 413 batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n ') 414 info(f"Verifying {batch_label_str}\n") 415 batch_success, batch_msg = process_batch(batch) 416 except (KeyboardInterrupt, Exception) as e: 417 batch_success = False 418 batch_msg = str(e) 419 retry_failed_batch = False 420 421 batch_msg_to_print = ( 422 f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}" 423 ) 424 mrsm.pprint((batch_success, batch_msg_to_print)) 425 426 if not batch_success and retry_failed_batch: 427 info(f"Retrying batch {batch_counter_str}...") 428 retry_batch_success, retry_batch_msg = process_batch(batch) 429 retry_batch_msg_to_print = ( 430 f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}" 431 ) 432 mrsm.pprint((retry_batch_success, retry_batch_msg_to_print)) 433 434 batch_success = retry_batch_success 435 batch_msg = retry_batch_msg 436 437 if not batch_success: 438 return False, f"Failed to verify {batch_label}:\n\n{batch_msg}" 439 440 chunks_message = get_chunks_success_message( 441 bounds_success_tuples, 442 header=message_header, 443 check_rowcounts_only=check_rowcounts_only, 444 ) 445 return True, chunks_message
Verify the contents of the pipe by resyncing its interval.
Parameters
- begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
- end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
- chunk_interval (Union[timedelta, int, None], default None):
If provided, use this as the size of the chunk boundaries.
Default to the value set in
pipe.parameters['chunk_minutes'](1440). - bounded (Optional[bool], default None):
If
True, do not verify older than the oldest sync time or newer than the newest. IfFalse, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g.pipe.parameters['verify']['bound_days']). - deduplicate (bool, default False):
If
True, deduplicate the pipe's table after the verification syncs. - workers (Optional[int], default None):
If provided, limit the verification to this many threads.
Use a value of
1to sync chunks in series. - batchsize (Optional[int], default None):
If provided, sync this many chunks in parallel.
Defaults to
Pipe.get_num_workers(). - skip_chunks_with_greater_rowcounts (bool, default False):
If
True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount. - check_rowcounts_only (bool, default False):
If
True, only compare rowcounts and print chunks which are out-of-sync. - debug (bool, default False): Verbosity toggle.
- kwargs (Any):
All keyword arguments are passed to
pipe.sync().
Returns
- A SuccessTuple indicating whether the pipe was successfully resynced.
546def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]: 547 """ 548 Return the interval used to determine the bound time (limit for verification syncs). 549 If the datetime axis is an integer, just return its value. 550 551 Below are the supported keys for the bound interval: 552 553 - `pipe.parameters['verify']['bound_minutes']` 554 - `pipe.parameters['verify']['bound_hours']` 555 - `pipe.parameters['verify']['bound_days']` 556 - `pipe.parameters['verify']['bound_weeks']` 557 - `pipe.parameters['verify']['bound_years']` 558 - `pipe.parameters['verify']['bound_seconds']` 559 560 If multiple keys are present, the first on this priority list will be used. 561 562 Returns 563 ------- 564 A `timedelta` or `int` value to be used to determine the bound time. 565 """ 566 verify_params = self.parameters.get('verify', {}) 567 prefix = 'bound_' 568 suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds') 569 keys_to_search = { 570 key: val 571 for key, val in verify_params.items() 572 if key.startswith(prefix) 573 } 574 bound_time_key, bound_time_value = None, None 575 for key, value in keys_to_search.items(): 576 for suffix in suffixes_to_check: 577 if key == prefix + suffix: 578 bound_time_key = key 579 bound_time_value = value 580 break 581 if bound_time_key is not None: 582 break 583 584 if bound_time_value is None: 585 return bound_time_value 586 587 dt_col = self.columns.get('datetime', None) 588 if not dt_col: 589 return bound_time_value 590 591 dt_typ = self.dtypes.get(dt_col, 'datetime') 592 if 'int' in dt_typ.lower(): 593 return int(bound_time_value) 594 595 interval_type = bound_time_key.replace(prefix, '') 596 return timedelta(**{interval_type: bound_time_value})
Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.
Below are the supported keys for the bound interval:
- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`
If multiple keys are present, the first on this priority list will be used.
Returns
- A
timedeltaorintvalue to be used to determine the bound time.
599def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]: 600 """ 601 The bound time is the limit at which long-running verification syncs should stop. 602 A value of `None` means verification syncs should be unbounded. 603 604 Like deriving a backtrack time from `pipe.get_sync_time()`, 605 the bound time is the sync time minus a large window (e.g. 366 days). 606 607 Unbound verification syncs (i.e. `bound_time is None`) 608 if the oldest sync time is less than the bound interval. 609 610 Returns 611 ------- 612 A `datetime` or `int` corresponding to the 613 `begin` bound for verification and deduplication syncs. 614 """ 615 bound_interval = self.get_bound_interval(debug=debug) 616 if bound_interval is None: 617 return None 618 619 sync_time = self.get_sync_time(debug=debug) 620 if sync_time is None: 621 return None 622 623 bound_time = sync_time - bound_interval 624 oldest_sync_time = self.get_sync_time(newest=False, debug=debug) 625 max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days'] 626 627 extreme_sync_times_delta = ( 628 hasattr(oldest_sync_time, 'tzinfo') 629 and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days) 630 ) 631 632 return ( 633 bound_time 634 if bound_time > oldest_sync_time or extreme_sync_times_delta 635 else None 636 )
The bound time is the limit at which long-running verification syncs should stop.
A value of None means verification syncs should be unbounded.
Like deriving a backtrack time from pipe.get_sync_time(),
the bound time is the sync time minus a large window (e.g. 366 days).
Unbound verification syncs (i.e. bound_time is None)
if the oldest sync time is less than the bound interval.
Returns
- A
datetimeorintcorresponding to the beginbound for verification and deduplication syncs.
12def delete( 13 self, 14 drop: bool = True, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Call the Pipe's instance connector's `delete_pipe()` method. 20 21 Parameters 22 ---------- 23 drop: bool, default True 24 If `True`, drop the pipes' target table. 25 26 debug : bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success (`bool`), message (`str`). 32 33 """ 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.venv import Venv 36 from meerschaum.connectors import get_connector_plugin 37 38 if self.temporary: 39 if self.cache: 40 invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug) 41 if not invalidate_success: 42 return invalidate_success, invalidate_msg 43 44 return ( 45 False, 46 "Cannot delete pipes created with `temporary=True` (read-only). " 47 + "You may want to call `pipe.drop()` instead." 48 ) 49 50 if drop: 51 drop_success, drop_msg = self.drop(debug=debug) 52 if not drop_success: 53 warn(f"Failed to drop {self}:\n{drop_msg}") 54 55 with Venv(get_connector_plugin(self.instance_connector)): 56 result = self.instance_connector.delete_pipe(self, debug=debug, **kw) 57 58 if not isinstance(result, tuple): 59 return False, f"Received an unexpected result from '{self.instance_connector}': {result}" 60 61 if result[0]: 62 self._invalidate_cache(hard=True, debug=debug) 63 self._clear_cache_key('_id', debug=debug) 64 65 return result
Call the Pipe's instance connector's delete_pipe() method.
Parameters
- drop (bool, default True):
If
True, drop the pipes' target table. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool), message (str).
14def drop( 15 self, 16 debug: bool = False, 17 **kw: Any 18) -> SuccessTuple: 19 """ 20 Call the Pipe's instance connector's `drop_pipe()` method. 21 22 Parameters 23 ---------- 24 debug: bool, default False: 25 Verbosity toggle. 26 27 Returns 28 ------- 29 A `SuccessTuple` of success, message. 30 31 """ 32 from meerschaum.utils.venv import Venv 33 from meerschaum.connectors import get_connector_plugin 34 35 self._clear_cache_key('_exists', debug=debug) 36 37 with Venv(get_connector_plugin(self.instance_connector)): 38 if hasattr(self.instance_connector, 'drop_pipe'): 39 result = self.instance_connector.drop_pipe(self, debug=debug, **kw) 40 else: 41 result = ( 42 False, 43 ( 44 "Cannot drop pipes for instance connectors of type " 45 f"'{self.instance_connector.type}'." 46 ) 47 ) 48 49 self._clear_cache_key('_exists', debug=debug) 50 self._clear_cache_key('_exists_timestamp', debug=debug) 51 52 return result
Call the Pipe's instance connector's drop_pipe() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
55def drop_indices( 56 self, 57 columns: Optional[List[str]] = None, 58 debug: bool = False, 59 **kw: Any 60) -> SuccessTuple: 61 """ 62 Call the Pipe's instance connector's `drop_indices()` method. 63 64 Parameters 65 ---------- 66 columns: Optional[List[str]] = None 67 If provided, only drop indices in the given list. 68 69 debug: bool, default False: 70 Verbosity toggle. 71 72 Returns 73 ------- 74 A `SuccessTuple` of success, message. 75 76 """ 77 from meerschaum.utils.venv import Venv 78 from meerschaum.connectors import get_connector_plugin 79 80 self._clear_cache_key('_columns_indices', debug=debug) 81 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 82 self._clear_cache_key('_columns_types', debug=debug) 83 self._clear_cache_key('_columns_types_timestamp', debug=debug) 84 85 with Venv(get_connector_plugin(self.instance_connector)): 86 if hasattr(self.instance_connector, 'drop_pipe_indices'): 87 result = self.instance_connector.drop_pipe_indices( 88 self, 89 columns=columns, 90 debug=debug, 91 **kw 92 ) 93 else: 94 result = ( 95 False, 96 ( 97 "Cannot drop indices for instance connectors of type " 98 f"'{self.instance_connector.type}'." 99 ) 100 ) 101 102 self._clear_cache_key('_columns_indices', debug=debug) 103 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 104 self._clear_cache_key('_columns_types', debug=debug) 105 self._clear_cache_key('_columns_types_timestamp', debug=debug) 106 107 return result
Call the Pipe's instance connector's drop_indices() method.
Parameters
- columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
14def create_indices( 15 self, 16 columns: Optional[List[str]] = None, 17 debug: bool = False, 18 **kw: Any 19) -> SuccessTuple: 20 """ 21 Call the Pipe's instance connector's `create_pipe_indices()` method. 22 23 Parameters 24 ---------- 25 debug: bool, default False: 26 Verbosity toggle. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 32 """ 33 from meerschaum.utils.venv import Venv 34 from meerschaum.connectors import get_connector_plugin 35 36 self._clear_cache_key('_columns_indices', debug=debug) 37 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 38 self._clear_cache_key('_columns_types', debug=debug) 39 self._clear_cache_key('_columns_types_timestamp', debug=debug) 40 41 with Venv(get_connector_plugin(self.instance_connector)): 42 if hasattr(self.instance_connector, 'create_pipe_indices'): 43 result = self.instance_connector.create_pipe_indices( 44 self, 45 columns=columns, 46 debug=debug, 47 **kw 48 ) 49 else: 50 result = ( 51 False, 52 ( 53 "Cannot create indices for instance connectors of type " 54 f"'{self.instance_connector.type}'." 55 ) 56 ) 57 58 self._clear_cache_key('_columns_indices', debug=debug) 59 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 60 self._clear_cache_key('_columns_types', debug=debug) 61 self._clear_cache_key('_columns_types_timestamp', debug=debug) 62 63 return result
Call the Pipe's instance connector's create_pipe_indices() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
16def clear( 17 self, 18 begin: Optional[datetime] = None, 19 end: Optional[datetime] = None, 20 params: Optional[Dict[str, Any]] = None, 21 debug: bool = False, 22 **kwargs: Any 23) -> SuccessTuple: 24 """ 25 Call the Pipe's instance connector's `clear_pipe` method. 26 27 Parameters 28 ---------- 29 begin: Optional[datetime], default None: 30 If provided, only remove rows newer than this datetime value. 31 32 end: Optional[datetime], default None: 33 If provided, only remove rows older than this datetime column (not including end). 34 35 params: Optional[Dict[str, Any]], default None 36 See `meerschaum.utils.sql.build_where`. 37 38 debug: bool, default False: 39 Verbositity toggle. 40 41 Returns 42 ------- 43 A `SuccessTuple` corresponding to whether this procedure completed successfully. 44 45 Examples 46 -------- 47 >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local') 48 >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]}) 49 >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]}) 50 >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]}) 51 >>> 52 >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0)) 53 >>> pipe.get_data() 54 dt 55 0 2020-01-01 56 57 """ 58 from meerschaum.utils.warnings import warn 59 from meerschaum.utils.venv import Venv 60 from meerschaum.connectors import get_connector_plugin 61 62 begin, end = self.parse_date_bounds(begin, end) 63 64 with Venv(get_connector_plugin(self.instance_connector)): 65 return self.instance_connector.clear_pipe( 66 self, 67 begin=begin, 68 end=end, 69 params=params, 70 debug=debug, 71 **kwargs 72 )
Call the Pipe's instance connector's clear_pipe method.
Parameters
- begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
- end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
See
meerschaum.utils.sql.build_where. - debug (bool, default False:): Verbositity toggle.
Returns
- A
SuccessTuplecorresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>>
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
dt
0 2020-01-01
15def deduplicate( 16 self, 17 begin: Union[datetime, int, None] = None, 18 end: Union[datetime, int, None] = None, 19 params: Optional[Dict[str, Any]] = None, 20 chunk_interval: Union[datetime, int, None] = None, 21 bounded: Optional[bool] = None, 22 workers: Optional[int] = None, 23 debug: bool = False, 24 _use_instance_method: bool = True, 25 **kwargs: Any 26) -> SuccessTuple: 27 """ 28 Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows. 29 30 Parameters 31 ---------- 32 begin: Union[datetime, int, None], default None: 33 If provided, only deduplicate rows newer than this datetime value. 34 35 end: Union[datetime, int, None], default None: 36 If provided, only deduplicate rows older than this datetime column (not including end). 37 38 params: Optional[Dict[str, Any]], default None 39 Restrict deduplication to this filter (for multiplexed data streams). 40 See `meerschaum.utils.sql.build_where`. 41 42 chunk_interval: Union[timedelta, int, None], default None 43 If provided, use this for the chunk bounds. 44 Defaults to the value set in `pipe.parameters['chunk_minutes']` (1440). 45 46 bounded: Optional[bool], default None 47 Only check outside the oldest and newest sync times if bounded is explicitly `False`. 48 49 workers: Optional[int], default None 50 If the instance connector is thread-safe, limit concurrenct syncs to this many threads. 51 52 debug: bool, default False: 53 Verbositity toggle. 54 55 kwargs: Any 56 All other keyword arguments are passed to 57 `pipe.sync()`, `pipe.clear()`, and `pipe.get_data(). 58 59 Returns 60 ------- 61 A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated. 62 """ 63 from meerschaum.utils.warnings import warn, info 64 from meerschaum.utils.misc import interval_str, items_str 65 from meerschaum.utils.venv import Venv 66 from meerschaum.connectors import get_connector_plugin 67 from meerschaum.utils.pool import get_pool 68 69 begin, end = self.parse_date_bounds(begin, end) 70 71 workers = self.get_num_workers(workers=workers) 72 pool = get_pool(workers=workers) 73 74 if _use_instance_method: 75 with Venv(get_connector_plugin(self.instance_connector)): 76 if hasattr(self.instance_connector, 'deduplicate_pipe'): 77 return self.instance_connector.deduplicate_pipe( 78 self, 79 begin=begin, 80 end=end, 81 params=params, 82 bounded=bounded, 83 debug=debug, 84 **kwargs 85 ) 86 87 ### Only unbound if explicitly False. 88 if bounded is None: 89 bounded = True 90 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 91 92 bound_time = self.get_bound_time(debug=debug) 93 if bounded and begin is None: 94 begin = ( 95 bound_time 96 if bound_time is not None 97 else self.get_sync_time(newest=False, debug=debug) 98 ) 99 if bounded and end is None: 100 end = self.get_sync_time(newest=True, debug=debug) 101 if end is not None: 102 end += ( 103 timedelta(minutes=1) 104 if hasattr(end, 'tzinfo') 105 else 1 106 ) 107 108 chunk_bounds = self.get_chunk_bounds( 109 bounded=bounded, 110 begin=begin, 111 end=end, 112 chunk_interval=chunk_interval, 113 debug=debug, 114 ) 115 116 indices = [col for col in self.columns.values() if col] 117 if not indices: 118 return False, "Cannot deduplicate without index columns." 119 120 def process_chunk_bounds(bounds) -> Tuple[ 121 Tuple[ 122 Union[datetime, int, None], 123 Union[datetime, int, None] 124 ], 125 SuccessTuple 126 ]: 127 ### Only selecting the index values here to keep bandwidth down. 128 chunk_begin, chunk_end = bounds 129 chunk_df = self.get_data( 130 select_columns=indices, 131 begin=chunk_begin, 132 end=chunk_end, 133 params=params, 134 debug=debug, 135 ) 136 if chunk_df is None: 137 return bounds, (True, "") 138 existing_chunk_len = len(chunk_df) 139 deduped_chunk_df = chunk_df.drop_duplicates(keep='last') 140 deduped_chunk_len = len(deduped_chunk_df) 141 142 if existing_chunk_len == deduped_chunk_len: 143 return bounds, (True, "") 144 145 chunk_msg_header = f"\n{chunk_begin} - {chunk_end}" 146 chunk_msg_body = "" 147 148 full_chunk = self.get_data( 149 begin=chunk_begin, 150 end=chunk_end, 151 params=params, 152 debug=debug, 153 ) 154 if full_chunk is None or len(full_chunk) == 0: 155 return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...") 156 157 chunk_indices = [ix for ix in indices if ix in full_chunk.columns] 158 if not chunk_indices: 159 return bounds, (False, f"None of {items_str(indices)} were present in chunk.") 160 try: 161 full_chunk = full_chunk.drop_duplicates( 162 subset=chunk_indices, 163 keep='last' 164 ).reset_index( 165 drop=True, 166 ) 167 except Exception as e: 168 return ( 169 bounds, 170 (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})") 171 ) 172 173 clear_success, clear_msg = self.clear( 174 begin=chunk_begin, 175 end=chunk_end, 176 params=params, 177 debug=debug, 178 ) 179 if not clear_success: 180 chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n" 181 warn(chunk_msg_body) 182 183 sync_success, sync_msg = self.sync(full_chunk, debug=debug) 184 if not sync_success: 185 chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n" 186 187 ### Finally check if the deduplication worked. 188 chunk_rowcount = self.get_rowcount( 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 debug=debug, 193 ) 194 if chunk_rowcount != deduped_chunk_len: 195 return bounds, ( 196 False, ( 197 chunk_msg_header + "\n" 198 + chunk_msg_body + ("\n" if chunk_msg_body else '') 199 + "Chunk rowcounts still differ (" 200 + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)." 201 ) 202 ) 203 204 return bounds, ( 205 True, ( 206 chunk_msg_header + "\n" 207 + chunk_msg_body + ("\n" if chunk_msg_body else '') 208 + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows." 209 ) 210 ) 211 212 info( 213 f"Deduplicating {len(chunk_bounds)} chunk" 214 + ('s' if len(chunk_bounds) != 1 else '') 215 + f" ({'un' if not bounded else ''}bounded)" 216 + f" of size '{interval_str(chunk_interval)}'" 217 + f" on {self}." 218 ) 219 bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds)) 220 bounds_successes = { 221 bounds: success_tuple 222 for bounds, success_tuple in bounds_success_tuples.items() 223 if success_tuple[0] 224 } 225 bounds_failures = { 226 bounds: success_tuple 227 for bounds, success_tuple in bounds_success_tuples.items() 228 if not success_tuple[0] 229 } 230 231 ### No need to retry if everything failed. 232 if len(bounds_failures) > 0 and len(bounds_successes) == 0: 233 return ( 234 False, 235 ( 236 f"Failed to deduplicate {len(bounds_failures)} chunk" 237 + ('s' if len(bounds_failures) != 1 else '') 238 + ".\n" 239 + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg]) 240 ) 241 ) 242 243 retry_bounds = [bounds for bounds in bounds_failures] 244 if not retry_bounds: 245 return ( 246 True, 247 ( 248 f"Successfully deduplicated {len(bounds_successes)} chunk" 249 + ('s' if len(bounds_successes) != 1 else '') 250 + ".\n" 251 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 252 ).rstrip('\n') 253 ) 254 255 info(f"Retrying {len(retry_bounds)} chunks for {self}...") 256 retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds)) 257 retry_bounds_successes = { 258 bounds: success_tuple 259 for bounds, success_tuple in bounds_success_tuples.items() 260 if success_tuple[0] 261 } 262 retry_bounds_failures = { 263 bounds: success_tuple 264 for bounds, success_tuple in bounds_success_tuples.items() 265 if not success_tuple[0] 266 } 267 268 bounds_successes.update(retry_bounds_successes) 269 if not retry_bounds_failures: 270 return ( 271 True, 272 ( 273 f"Successfully deduplicated {len(bounds_successes)} chunk" 274 + ('s' if len(bounds_successes) != 1 else '') 275 + f"({len(retry_bounds_successes)} retried):\n" 276 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 277 ).rstrip('\n') 278 ) 279 280 return ( 281 False, 282 ( 283 f"Failed to deduplicate {len(bounds_failures)} chunk" 284 + ('s' if len(retry_bounds_failures) != 1 else '') 285 + ".\n" 286 + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg]) 287 ).rstrip('\n') 288 )
Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.
Parameters
- begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
- end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
Restrict deduplication to this filter (for multiplexed data streams).
See
meerschaum.utils.sql.build_where. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this for the chunk bounds.
Defaults to the value set in
pipe.parameters['chunk_minutes'](1440). - bounded (Optional[bool], default None):
Only check outside the oldest and newest sync times if bounded is explicitly
False. - workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
- debug (bool, default False:): Verbositity toggle.
- kwargs (Any):
All other keyword arguments are passed to
pipe.sync(),pipe.clear(), and `pipe.get_data().
Returns
- A
SuccessTuplecorresponding to whether all of the chunks were successfully deduplicated.
16def bootstrap( 17 self, 18 debug: bool = False, 19 yes: bool = False, 20 force: bool = False, 21 noask: bool = False, 22 shell: bool = False, 23 **kw 24) -> SuccessTuple: 25 """ 26 Prompt the user to create a pipe's requirements all from one method. 27 This method shouldn't be used in any automated scripts because it interactively 28 prompts the user and therefore may hang. 29 30 Parameters 31 ---------- 32 debug: bool, default False: 33 Verbosity toggle. 34 35 yes: bool, default False: 36 Print the questions and automatically agree. 37 38 force: bool, default False: 39 Skip the questions and agree anyway. 40 41 noask: bool, default False: 42 Print the questions but go with the default answer. 43 44 shell: bool, default False: 45 Used to determine if we are in the interactive shell. 46 47 Returns 48 ------- 49 A `SuccessTuple` corresponding to the success of this procedure. 50 51 """ 52 53 from meerschaum.utils.warnings import info 54 from meerschaum.utils.prompt import prompt, yes_no 55 from meerschaum.utils.formatting import pprint 56 from meerschaum.config import get_config 57 from meerschaum.utils.formatting._shell import clear_screen 58 from meerschaum.utils.formatting import print_tuple 59 from meerschaum.actions import actions 60 from meerschaum.utils.venv import Venv 61 from meerschaum.connectors import get_connector_plugin 62 63 _clear = get_config('shell', 'clear_screen', patch=True) 64 65 if self.id is not None: 66 delete_tuple = self.delete(debug=debug) 67 if not delete_tuple[0]: 68 return delete_tuple 69 70 if _clear: 71 clear_screen(debug=debug) 72 73 _parameters = _get_parameters(self, debug=debug) 74 self.parameters = _parameters 75 pprint(self.parameters) 76 try: 77 prompt( 78 f"\n Press [Enter] to register {self} with the above configuration:", 79 icon = False 80 ) 81 except KeyboardInterrupt: 82 return False, f"Aborted bootstrapping {self}." 83 84 with Venv(get_connector_plugin(self.instance_connector)): 85 register_tuple = self.instance_connector.register_pipe(self, debug=debug) 86 87 if not register_tuple[0]: 88 return register_tuple 89 90 if _clear: 91 clear_screen(debug=debug) 92 93 try: 94 if yes_no( 95 f"Would you like to edit the definition for {self}?", 96 yes=yes, 97 noask=noask, 98 default='n', 99 ): 100 edit_tuple = self.edit_definition(debug=debug) 101 if not edit_tuple[0]: 102 return edit_tuple 103 104 if yes_no( 105 f"Would you like to try syncing {self} now?", 106 yes=yes, 107 noask=noask, 108 default='n', 109 ): 110 sync_tuple = actions['sync']( 111 ['pipes'], 112 connector_keys=[self.connector_keys], 113 metric_keys=[self.metric_key], 114 location_keys=[self.location_key], 115 mrsm_instance=str(self.instance_connector), 116 debug=debug, 117 shell=shell, 118 ) 119 if not sync_tuple[0]: 120 return sync_tuple 121 except Exception as e: 122 return False, f"Failed to bootstrap {self}:\n" + str(e) 123 124 print_tuple((True, f"Finished bootstrapping {self}!")) 125 info( 126 "You can edit this pipe later with `edit pipes` " 127 + "or set the definition with `edit pipes definition`.\n" 128 + " To sync data into your pipe, run `sync pipes`." 129 ) 130 131 return True, "Success"
Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.
Parameters
- debug (bool, default False:): Verbosity toggle.
- yes (bool, default False:): Print the questions and automatically agree.
- force (bool, default False:): Skip the questions and agree anyway.
- noask (bool, default False:): Print the questions but go with the default answer.
- shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
- A
SuccessTuplecorresponding to the success of this procedure.
20def enforce_dtypes( 21 self, 22 df: 'pd.DataFrame', 23 chunksize: Optional[int] = -1, 24 enforce: bool = True, 25 safe_copy: bool = True, 26 dtypes: Optional[Dict[str, str]] = None, 27 debug: bool = False, 28) -> 'pd.DataFrame': 29 """ 30 Cast the input dataframe to the pipe's registered data types. 31 If the pipe does not exist and dtypes are not set, return the dataframe. 32 """ 33 import traceback 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.debug import dprint 36 from meerschaum.utils.dataframe import ( 37 parse_df_datetimes, 38 enforce_dtypes as _enforce_dtypes, 39 parse_simple_lines, 40 ) 41 from meerschaum.utils.dtypes import are_dtypes_equal 42 from meerschaum.utils.packages import import_pandas 43 pd = import_pandas(debug=debug) 44 if df is None: 45 if debug: 46 dprint( 47 "Received None instead of a DataFrame.\n" 48 + " Skipping dtype enforcement..." 49 ) 50 return df 51 52 if not self.enforce: 53 enforce = False 54 55 explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {} 56 pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes 57 58 try: 59 if isinstance(df, str): 60 if df.strip() and df.strip()[0] not in ('{', '['): 61 df = parse_df_datetimes( 62 parse_simple_lines(df), 63 ignore_cols=[ 64 col 65 for col, dtype in pipe_dtypes.items() 66 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 67 ], 68 ) 69 else: 70 df = parse_df_datetimes( 71 pd.read_json(StringIO(df)), 72 ignore_cols=[ 73 col 74 for col, dtype in pipe_dtypes.items() 75 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 76 ], 77 ignore_all=(not enforce), 78 strip_timezone=(self.tzinfo is None), 79 chunksize=chunksize, 80 debug=debug, 81 ) 82 elif isinstance(df, (dict, list, tuple)): 83 df = parse_df_datetimes( 84 df, 85 ignore_cols=[ 86 col 87 for col, dtype in pipe_dtypes.items() 88 if (not enforce or not are_dtypes_equal(str(dtype), 'datetime')) 89 ], 90 strip_timezone=(self.tzinfo is None), 91 chunksize=chunksize, 92 debug=debug, 93 ) 94 except Exception as e: 95 warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}") 96 return None 97 98 if not pipe_dtypes: 99 if debug: 100 dprint( 101 f"Could not find dtypes for {self}.\n" 102 + "Skipping dtype enforcement..." 103 ) 104 return df 105 106 return _enforce_dtypes( 107 df, 108 pipe_dtypes, 109 explicit_dtypes=explicit_dtypes, 110 safe_copy=safe_copy, 111 strip_timezone=(self.tzinfo is None), 112 coerce_numeric=self.mixed_numerics, 113 coerce_timezone=enforce, 114 debug=debug, 115 )
Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.
118def infer_dtypes( 119 self, 120 persist: bool = False, 121 refresh: bool = False, 122 debug: bool = False, 123) -> Dict[str, Any]: 124 """ 125 If `dtypes` is not set in `meerschaum.Pipe.parameters`, 126 infer the data types from the underlying table if it exists. 127 128 Parameters 129 ---------- 130 persist: bool, default False 131 If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`. 132 NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only. 133 134 refresh: bool, default False 135 If `True`, retrieve the latest columns-types for the pipe. 136 See `Pipe.get_columns.types()`. 137 138 Returns 139 ------- 140 A dictionary of strings containing the pandas data types for this Pipe. 141 """ 142 if not self.exists(debug=debug): 143 return {} 144 145 from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type 146 from meerschaum.utils.dtypes import to_pandas_dtype 147 148 ### NOTE: get_columns_types() may return either the types as 149 ### PostgreSQL- or Pandas-style. 150 columns_types = self.get_columns_types(refresh=refresh, debug=debug) 151 152 remote_pd_dtypes = { 153 c: ( 154 get_pd_type_from_db_type(t, allow_custom_dtypes=True) 155 if str(t).isupper() 156 else to_pandas_dtype(t) 157 ) 158 for c, t in columns_types.items() 159 } if columns_types else {} 160 if not persist: 161 return remote_pd_dtypes 162 163 parameters = self.get_parameters(refresh=refresh, debug=debug) 164 dtypes = parameters.get('dtypes', {}) 165 dtypes.update({ 166 col: typ 167 for col, typ in remote_pd_dtypes.items() 168 if col not in dtypes 169 }) 170 self.dtypes = dtypes 171 self.edit(interactive=False, debug=debug) 172 return remote_pd_dtypes
If dtypes is not set in meerschaum.Pipe.parameters,
infer the data types from the underlying table if it exists.
Parameters
- persist (bool, default False):
If
True, persist the inferred data types tomeerschaum.Pipe.parameters. NOTE: Use with caution! Generallydtypesis meant to be user-configurable only. - refresh (bool, default False):
If
True, retrieve the latest columns-types for the pipe. SeePipe.get_columns.types().
Returns
- A dictionary of strings containing the pandas data types for this Pipe.
15def copy_to( 16 self, 17 instance_keys: str, 18 sync: bool = True, 19 begin: Union[datetime, int, None] = None, 20 end: Union[datetime, int, None] = None, 21 params: Optional[Dict[str, Any]] = None, 22 chunk_interval: Union[timedelta, int, None] = None, 23 debug: bool = False, 24 **kwargs: Any 25) -> SuccessTuple: 26 """ 27 Copy a pipe to another instance. 28 29 Parameters 30 ---------- 31 instance_keys: str 32 The instance to which to copy this pipe. 33 34 sync: bool, default True 35 If `True`, sync the source pipe's documents 36 37 begin: Union[datetime, int, None], default None 38 Beginning datetime value to pass to `Pipe.get_data()`. 39 40 end: Union[datetime, int, None], default None 41 End datetime value to pass to `Pipe.get_data()`. 42 43 params: Optional[Dict[str, Any]], default None 44 Parameters filter to pass to `Pipe.get_data()`. 45 46 chunk_interval: Union[timedelta, int, None], default None 47 The size of chunks to retrieve from `Pipe.get_data()` for syncing. 48 49 kwargs: Any 50 Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`. 51 52 Returns 53 ------- 54 A SuccessTuple indicating success. 55 """ 56 if str(instance_keys) == self.instance_keys: 57 return False, f"Cannot copy {self} to instance '{instance_keys}'." 58 59 begin, end = self.parse_date_bounds(begin, end) 60 61 new_pipe = mrsm.Pipe( 62 self.connector_keys, 63 self.metric_key, 64 self.location_key, 65 parameters=self.parameters.copy(), 66 instance=instance_keys, 67 ) 68 69 new_pipe_is_registered = new_pipe.id is not None 70 71 metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register 72 metadata_success, metadata_msg = metadata_method(debug=debug) 73 if not metadata_success: 74 return metadata_success, metadata_msg 75 76 if not self.exists(debug=debug): 77 return True, f"{self} does not exist; nothing to sync." 78 79 original_as_iterator = kwargs.get('as_iterator', None) 80 kwargs['as_iterator'] = True 81 82 chunk_generator = self.get_data( 83 begin=begin, 84 end=end, 85 params=params, 86 chunk_interval=chunk_interval, 87 debug=debug, 88 **kwargs 89 ) 90 91 if original_as_iterator is None: 92 _ = kwargs.pop('as_iterator', None) 93 else: 94 kwargs['as_iterator'] = original_as_iterator 95 96 sync_success, sync_msg = new_pipe.sync( 97 chunk_generator, 98 begin=begin, 99 end=end, 100 params=params, 101 debug=debug, 102 **kwargs 103 ) 104 msg = ( 105 f"Successfully synced {new_pipe}:\n{sync_msg}" 106 if sync_success 107 else f"Failed to sync {new_pipe}:\n{sync_msg}" 108 ) 109 return sync_success, msg
Copy a pipe to another instance.
Parameters
- instance_keys (str): The instance to which to copy this pipe.
- sync (bool, default True):
If
True, sync the source pipe's documents - begin (Union[datetime, int, None], default None):
Beginning datetime value to pass to
Pipe.get_data(). - end (Union[datetime, int, None], default None):
End datetime value to pass to
Pipe.get_data(). - params (Optional[Dict[str, Any]], default None):
Parameters filter to pass to
Pipe.get_data(). - chunk_interval (Union[timedelta, int, None], default None):
The size of chunks to retrieve from
Pipe.get_data()for syncing. - kwargs (Any):
Additional flags to pass to
Pipe.get_data()andPipe.sync(), e.g.workers.
Returns
- A SuccessTuple indicating success.
30class Plugin: 31 """Handle packaging of Meerschaum plugins.""" 32 33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 import meerschaum.config.paths as paths 46 from meerschaum._internal.static import STATIC_CONFIG 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else paths.VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo 74 75 76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector 93 94 95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version 106 107 108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module 121 122 123 @property 124 def __file__(self) -> Union[str, None]: 125 """ 126 Return the file path (str) of the plugin if it exists, otherwise `None`. 127 """ 128 if self.__dict__.get('_module', None) is not None: 129 return self.module.__file__ 130 131 import meerschaum.config.paths as paths 132 133 potential_dir = paths.PLUGINS_RESOURCES_PATH / self.name 134 if ( 135 potential_dir.exists() 136 and potential_dir.is_dir() 137 and (potential_dir / '__init__.py').exists() 138 ): 139 return str((potential_dir / '__init__.py').as_posix()) 140 141 potential_file = paths.PLUGINS_RESOURCES_PATH / (self.name + '.py') 142 if potential_file.exists() and not potential_file.is_dir(): 143 return str(potential_file.as_posix()) 144 145 return None 146 147 148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path 159 160 161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None 170 171 172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path 255 256 257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 289 import meerschaum.config.paths as paths 290 from meerschaum.utils.warnings import warn, error 291 if debug: 292 from meerschaum.utils.debug import dprint 293 import tarfile 294 import re 295 import ast 296 from meerschaum.plugins import sync_plugins_symlinks 297 from meerschaum.utils.packages import attempt_import, reload_meerschaum 298 from meerschaum.utils.venv import init_venv 299 from meerschaum.utils.misc import safely_extract_tar 300 old_cwd = os.getcwd() 301 old_version = '' 302 new_version = '' 303 temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name 304 temp_dir.mkdir(exist_ok=True) 305 306 if not self.archive_path.exists(): 307 return False, f"Missing archive file for plugin '{self}'." 308 if self.version is not None: 309 old_version = self.version 310 if debug: 311 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 312 313 if debug: 314 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 315 316 try: 317 with tarfile.open(self.archive_path, 'r:gz') as tarf: 318 safely_extract_tar(tarf, temp_dir) 319 except Exception as e: 320 warn(e) 321 return False, f"Failed to extract plugin '{self.name}'." 322 323 ### search for version information 324 files = os.listdir(temp_dir) 325 326 if str(files[0]) == self.name: 327 is_dir = True 328 elif str(files[0]) == self.name + '.py': 329 is_dir = False 330 else: 331 error(f"Unknown format encountered for plugin '{self}'.") 332 333 fpath = temp_dir / files[0] 334 if is_dir: 335 fpath = fpath / '__init__.py' 336 337 init_venv(self.name, debug=debug) 338 with open(fpath, 'r', encoding='utf-8') as f: 339 init_lines = f.readlines() 340 new_version = None 341 for line in init_lines: 342 if '__version__' not in line: 343 continue 344 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 345 if not version_match: 346 continue 347 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 348 break 349 if not new_version: 350 warn( 351 f"No `__version__` defined for plugin '{self}'. " 352 + "Assuming new version...", 353 stack = False, 354 ) 355 356 packaging_version = attempt_import('packaging.version') 357 try: 358 is_new_version = (not new_version and not old_version) or ( 359 packaging_version.parse(old_version) < packaging_version.parse(new_version) 360 ) 361 is_same_version = new_version and old_version and ( 362 packaging_version.parse(old_version) == packaging_version.parse(new_version) 363 ) 364 except Exception: 365 is_new_version, is_same_version = True, False 366 367 ### Determine where to permanently store the new plugin. 368 plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0] 369 for path in paths.PLUGINS_DIR_PATHS: 370 if not path.exists(): 371 warn(f"Plugins path does not exist: {path}", stack=False) 372 continue 373 374 files_in_plugins_dir = os.listdir(path) 375 if ( 376 self.name in files_in_plugins_dir 377 or 378 (self.name + '.py') in files_in_plugins_dir 379 ): 380 plugin_installation_dir_path = path 381 break 382 383 success_msg = ( 384 f"Successfully installed plugin '{self}'" 385 + ("\n (skipped dependencies)" if skip_deps else "") 386 + "." 387 ) 388 success, abort = None, None 389 390 if is_same_version and not force: 391 success, msg = True, ( 392 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 393 " Install again with `-f` or `--force` to reinstall." 394 ) 395 abort = True 396 elif is_new_version or force: 397 for src_dir, dirs, files in os.walk(temp_dir): 398 if success is not None: 399 break 400 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 401 if not os.path.exists(dst_dir): 402 os.mkdir(dst_dir) 403 for f in files: 404 src_file = os.path.join(src_dir, f) 405 dst_file = os.path.join(dst_dir, f) 406 if os.path.exists(dst_file): 407 os.remove(dst_file) 408 409 if debug: 410 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 411 try: 412 shutil.move(src_file, dst_dir) 413 except Exception: 414 success, msg = False, ( 415 f"Failed to install plugin '{self}': " + 416 f"Could not move file '{src_file}' to '{dst_dir}'" 417 ) 418 print(msg) 419 break 420 if success is None: 421 success, msg = True, success_msg 422 else: 423 success, msg = False, ( 424 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 425 + f"attempted version {new_version}." 426 ) 427 428 shutil.rmtree(temp_dir) 429 os.chdir(old_cwd) 430 431 ### Reload the plugin's module. 432 sync_plugins_symlinks(debug=debug) 433 if '_module' in self.__dict__: 434 del self.__dict__['_module'] 435 init_venv(venv=self.name, force=True, debug=debug) 436 reload_meerschaum(debug=debug) 437 438 ### if we've already failed, return here 439 if not success or abort: 440 _ongoing_installations.remove(self.full_name) 441 return success, msg 442 443 ### attempt to install dependencies 444 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 445 if not dependencies_installed: 446 _ongoing_installations.remove(self.full_name) 447 return False, f"Failed to install dependencies for plugin '{self}'." 448 449 ### handling success tuple, bool, or other (typically None) 450 setup_tuple = self.setup(debug=debug) 451 if isinstance(setup_tuple, tuple): 452 if not setup_tuple[0]: 453 success, msg = setup_tuple 454 elif isinstance(setup_tuple, bool): 455 if not setup_tuple: 456 success, msg = False, ( 457 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 458 f"Check `setup()` in '{self.__file__}' for more information " + 459 "(no error message provided)." 460 ) 461 else: 462 success, msg = True, success_msg 463 elif setup_tuple is None: 464 success = True 465 msg = ( 466 f"Post-install for plugin '{self}' returned None. " + 467 "Assuming plugin successfully installed." 468 ) 469 warn(msg) 470 else: 471 success = False 472 msg = ( 473 f"Post-install for plugin '{self}' returned unexpected value " + 474 f"of type '{type(setup_tuple)}': {setup_tuple}" 475 ) 476 477 _ongoing_installations.remove(self.full_name) 478 _ = self.module 479 return success, msg 480 481 482 def remove_archive( 483 self, 484 debug: bool = False 485 ) -> SuccessTuple: 486 """Remove a plugin's archive file.""" 487 if not self.archive_path.exists(): 488 return True, f"Archive file for plugin '{self}' does not exist." 489 try: 490 self.archive_path.unlink() 491 except Exception as e: 492 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 493 return True, "Success" 494 495 496 def remove_venv( 497 self, 498 debug: bool = False 499 ) -> SuccessTuple: 500 """Remove a plugin's virtual environment.""" 501 if not self.venv_path.exists(): 502 return True, f"Virtual environment for plugin '{self}' does not exist." 503 try: 504 shutil.rmtree(self.venv_path) 505 except Exception as e: 506 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 507 return True, "Success" 508 509 510 def uninstall(self, debug: bool = False) -> SuccessTuple: 511 """ 512 Remove a plugin, its virtual environment, and archive file. 513 """ 514 from meerschaum.utils.packages import reload_meerschaum 515 from meerschaum.plugins import sync_plugins_symlinks 516 from meerschaum.utils.warnings import warn, info 517 warnings_thrown_count: int = 0 518 max_warnings: int = 3 519 520 if not self.is_installed(): 521 info( 522 f"Plugin '{self.name}' doesn't seem to be installed.\n " 523 + "Checking for artifacts...", 524 stack = False, 525 ) 526 else: 527 real_path = pathlib.Path(os.path.realpath(self.__file__)) 528 try: 529 if real_path.name == '__init__.py': 530 shutil.rmtree(real_path.parent) 531 else: 532 real_path.unlink() 533 except Exception as e: 534 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 535 warnings_thrown_count += 1 536 else: 537 info(f"Removed source files for plugin '{self.name}'.") 538 539 if self.venv_path.exists(): 540 success, msg = self.remove_venv(debug=debug) 541 if not success: 542 warn(msg, stack=False) 543 warnings_thrown_count += 1 544 else: 545 info(f"Removed virtual environment from plugin '{self.name}'.") 546 547 success = warnings_thrown_count < max_warnings 548 sync_plugins_symlinks(debug=debug) 549 self.deactivate_venv(force=True, debug=debug) 550 reload_meerschaum(debug=debug) 551 return success, ( 552 f"Successfully uninstalled plugin '{self}'." if success 553 else f"Failed to uninstall plugin '{self}'." 554 ) 555 556 557 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 558 """ 559 If exists, run the plugin's `setup()` function. 560 561 Parameters 562 ---------- 563 *args: str 564 The positional arguments passed to the `setup()` function. 565 566 debug: bool, default False 567 Verbosity toggle. 568 569 **kw: Any 570 The keyword arguments passed to the `setup()` function. 571 572 Returns 573 ------- 574 A `SuccessTuple` or `bool` indicating success. 575 576 """ 577 from meerschaum.utils.debug import dprint 578 import inspect 579 _setup = None 580 for name, fp in inspect.getmembers(self.module): 581 if name == 'setup' and inspect.isfunction(fp): 582 _setup = fp 583 break 584 585 ### assume success if no setup() is found (not necessary) 586 if _setup is None: 587 return True 588 589 sig = inspect.signature(_setup) 590 has_debug, has_kw = ('debug' in sig.parameters), False 591 for k, v in sig.parameters.items(): 592 if '**' in str(v): 593 has_kw = True 594 break 595 596 _kw = {} 597 if has_kw: 598 _kw.update(kw) 599 if has_debug: 600 _kw['debug'] = debug 601 602 if debug: 603 dprint(f"Running setup for plugin '{self}'...") 604 try: 605 self.activate_venv(debug=debug) 606 return_tuple = _setup(*args, **_kw) 607 self.deactivate_venv(debug=debug) 608 except Exception as e: 609 return False, str(e) 610 611 if isinstance(return_tuple, tuple): 612 return return_tuple 613 if isinstance(return_tuple, bool): 614 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 615 if return_tuple is None: 616 return False, f"Setup for Plugin '{self.name}' returned None." 617 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}" 618 619 620 def get_dependencies( 621 self, 622 debug: bool = False, 623 ) -> List[str]: 624 """ 625 If the Plugin has specified dependencies in a list called `required`, return the list. 626 627 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 628 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 629 630 Parameters 631 ---------- 632 debug: bool, default False 633 Verbosity toggle. 634 635 Returns 636 ------- 637 A list of required packages and plugins (str). 638 639 """ 640 if '_required' in self.__dict__: 641 return self._required 642 643 ### If the plugin has not yet been imported, 644 ### infer the dependencies from the source text. 645 ### This is not super robust, and it doesn't feel right 646 ### having multiple versions of the logic. 647 ### This is necessary when determining the activation order 648 ### without having import the module. 649 ### For consistency's sake, the module-less method does not cache the requirements. 650 if self.__dict__.get('_module', None) is None: 651 file_path = self.__file__ 652 if file_path is None: 653 return [] 654 with open(file_path, 'r', encoding='utf-8') as f: 655 text = f.read() 656 657 if 'required' not in text: 658 return [] 659 660 ### This has some limitations: 661 ### It relies on `required` being manually declared. 662 ### We lose the ability to dynamically alter the `required` list, 663 ### which is why we've kept the module-reliant method below. 664 import ast, re 665 ### NOTE: This technically would break 666 ### if `required` was the very first line of the file. 667 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 668 if not req_start_match: 669 return [] 670 req_start = req_start_match.start() 671 equals_sign = req_start + text[req_start:].find('=') 672 673 ### Dependencies may have brackets within the strings, so push back the index. 674 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 675 if first_opening_brace == -1: 676 return [] 677 678 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 679 if next_closing_brace == -1: 680 return [] 681 682 start_ix = first_opening_brace + 1 683 end_ix = next_closing_brace 684 685 num_braces = 0 686 while True: 687 if '[' not in text[start_ix:end_ix]: 688 break 689 num_braces += 1 690 start_ix = end_ix 691 end_ix += text[end_ix + 1:].find(']') + 1 692 693 req_end = end_ix + 1 694 req_text = ( 695 text[(first_opening_brace-1):req_end] 696 .lstrip() 697 .replace('=', '', 1) 698 .lstrip() 699 .rstrip() 700 ) 701 try: 702 required = ast.literal_eval(req_text) 703 except Exception as e: 704 warn( 705 f"Unable to determine requirements for plugin '{self.name}' " 706 + "without importing the module.\n" 707 + " This may be due to dynamically setting the global `required` list.\n" 708 + f" {e}" 709 ) 710 return [] 711 return required 712 713 import inspect 714 self.activate_venv(dependencies=False, debug=debug) 715 required = [] 716 for name, val in inspect.getmembers(self.module): 717 if name == 'required': 718 required = val 719 break 720 self._required = required 721 self.deactivate_venv(dependencies=False, debug=debug) 722 return required 723 724 725 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 726 """ 727 Return a list of required Plugin objects. 728 """ 729 from meerschaum.utils.warnings import warn 730 from meerschaum.config import get_config 731 from meerschaum._internal.static import STATIC_CONFIG 732 from meerschaum.connectors.parse import is_valid_connector_keys 733 plugins = [] 734 _deps = self.get_dependencies(debug=debug) 735 sep = STATIC_CONFIG['plugins']['repo_separator'] 736 plugin_names = [ 737 _d[len('plugin:'):] for _d in _deps 738 if _d.startswith('plugin:') and len(_d) > len('plugin:') 739 ] 740 default_repo_keys = get_config('meerschaum', 'repository') 741 skipped_repo_keys = set() 742 743 for _plugin_name in plugin_names: 744 if sep in _plugin_name: 745 try: 746 _plugin_name, _repo_keys = _plugin_name.split(sep) 747 except Exception: 748 _repo_keys = default_repo_keys 749 warn( 750 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 751 + f"Will try to use '{_repo_keys}' instead.", 752 stack = False, 753 ) 754 else: 755 _repo_keys = default_repo_keys 756 757 if _repo_keys in skipped_repo_keys: 758 continue 759 760 if not is_valid_connector_keys(_repo_keys): 761 warn( 762 f"Invalid connector '{_repo_keys}'.\n" 763 f" Skipping required plugins from repository '{_repo_keys}'", 764 stack=False, 765 ) 766 continue 767 768 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 769 770 return plugins 771 772 773 def get_required_packages(self, debug: bool=False) -> List[str]: 774 """ 775 Return the required package names (excluding plugins). 776 """ 777 _deps = self.get_dependencies(debug=debug) 778 return [_d for _d in _deps if not _d.startswith('plugin:')] 779 780 781 def activate_venv( 782 self, 783 dependencies: bool = True, 784 init_if_not_exists: bool = True, 785 debug: bool = False, 786 **kw 787 ) -> bool: 788 """ 789 Activate the virtual environments for the plugin and its dependencies. 790 791 Parameters 792 ---------- 793 dependencies: bool, default True 794 If `True`, activate the virtual environments for required plugins. 795 796 Returns 797 ------- 798 A bool indicating success. 799 """ 800 import meerschaum.config.paths as paths 801 from meerschaum.utils.venv import venv_target_path 802 from meerschaum.utils.packages import activate_venv 803 from meerschaum.utils.misc import make_symlink, is_symlink 804 805 if dependencies: 806 for plugin in self.get_required_plugins(debug=debug): 807 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 808 809 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 810 venv_meerschaum_path = vtp / 'meerschaum' 811 812 try: 813 success, msg = True, "Success" 814 if is_symlink(venv_meerschaum_path): 815 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH: 816 venv_meerschaum_path.unlink() 817 success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH) 818 except Exception as e: 819 success, msg = False, str(e) 820 if not success: 821 warn( 822 f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n" 823 f"{msg}" 824 ) 825 826 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw) 827 828 829 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 830 """ 831 Deactivate the virtual environments for the plugin and its dependencies. 832 833 Parameters 834 ---------- 835 dependencies: bool, default True 836 If `True`, deactivate the virtual environments for required plugins. 837 838 Returns 839 ------- 840 A bool indicating success. 841 """ 842 from meerschaum.utils.packages import deactivate_venv 843 success = deactivate_venv(self.name, debug=debug, **kw) 844 if dependencies: 845 for plugin in self.get_required_plugins(debug=debug): 846 plugin.deactivate_venv(debug=debug, **kw) 847 return success 848 849 850 def install_dependencies( 851 self, 852 force: bool = False, 853 debug: bool = False, 854 ) -> bool: 855 """ 856 If specified, install dependencies. 857 858 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 859 Meerschaum plugins from the same repository as this Plugin. 860 To install from a different repository, add the repo keys after `'@'` 861 (e.g. `'plugin:foo@api:bar'`). 862 863 Parameters 864 ---------- 865 force: bool, default False 866 If `True`, continue with the installation, even if some 867 required packages fail to install. 868 869 debug: bool, default False 870 Verbosity toggle. 871 872 Returns 873 ------- 874 A bool indicating success. 875 """ 876 from meerschaum.utils.packages import pip_install, venv_contains_package 877 from meerschaum.utils.warnings import warn, info 878 _deps = self.get_dependencies(debug=debug) 879 if not _deps and self.requirements_file_path is None: 880 return True 881 882 plugins = self.get_required_plugins(debug=debug) 883 for _plugin in plugins: 884 if _plugin.name == self.name: 885 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 886 continue 887 _success, _msg = _plugin.repo_connector.install_plugin( 888 _plugin.name, debug=debug, force=force 889 ) 890 if not _success: 891 warn( 892 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 893 + f" for plugin '{self.name}':\n" + _msg, 894 stack = False, 895 ) 896 if not force: 897 warn( 898 "Try installing with the `--force` flag to continue anyway.", 899 stack = False, 900 ) 901 return False 902 info( 903 "Continuing with installation despite the failure " 904 + "(careful, things might be broken!)...", 905 icon = False 906 ) 907 908 909 ### First step: parse `requirements.txt` if it exists. 910 if self.requirements_file_path is not None: 911 if not pip_install( 912 requirements_file_path=self.requirements_file_path, 913 venv=self.name, debug=debug 914 ): 915 warn( 916 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 917 stack = False, 918 ) 919 if not force: 920 warn( 921 "Try installing with `--force` to continue anyway.", 922 stack = False, 923 ) 924 return False 925 info( 926 "Continuing with installation despite the failure " 927 + "(careful, things might be broken!)...", 928 icon = False 929 ) 930 931 932 ### Don't reinstall packages that are already included in required plugins. 933 packages = [] 934 _packages = self.get_required_packages(debug=debug) 935 accounted_for_packages = set() 936 for package_name in _packages: 937 for plugin in plugins: 938 if venv_contains_package(package_name, plugin.name): 939 accounted_for_packages.add(package_name) 940 break 941 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 942 943 ### Attempt pip packages installation. 944 if packages: 945 for package in packages: 946 if not pip_install(package, venv=self.name, debug=debug): 947 warn( 948 f"Failed to install required package '{package}'" 949 + f" for plugin '{self.name}'.", 950 stack = False, 951 ) 952 if not force: 953 warn( 954 "Try installing with `--force` to continue anyway.", 955 stack = False, 956 ) 957 return False 958 info( 959 "Continuing with installation despite the failure " 960 + "(careful, things might be broken!)...", 961 icon = False 962 ) 963 return True 964 965 966 @property 967 def full_name(self) -> str: 968 """ 969 Include the repo keys with the plugin's name. 970 """ 971 from meerschaum._internal.static import STATIC_CONFIG 972 sep = STATIC_CONFIG['plugins']['repo_separator'] 973 return self.name + sep + str(self.repo_connector) 974 975 976 def __str__(self): 977 return self.name 978 979 980 def __repr__(self): 981 return f"Plugin('{self.name}', repo='{self.repo_connector}')" 982 983 984 def __del__(self): 985 pass
Handle packaging of Meerschaum plugins.
33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 import meerschaum.config.paths as paths 46 from meerschaum._internal.static import STATIC_CONFIG 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else paths.VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo
76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector
Return the repository connector for this plugin.
NOTE: This imports the connectors module, which imports certain plugin modules.
95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version
Return the plugin's module version is defined (__version__) if it's defined.
108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module
Return the Python module of the underlying plugin.
148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path
If a file named requirements.txt exists, return its path.
161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None
Check whether a plugin is correctly installed.
Returns
- A
boolindicating whether a plugin exists and is successfully imported.
172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path
Compress the plugin's source files into a .tar.gz archive and return the archive's path.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
pathlib.Pathto the archive file's path.
257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 289 import meerschaum.config.paths as paths 290 from meerschaum.utils.warnings import warn, error 291 if debug: 292 from meerschaum.utils.debug import dprint 293 import tarfile 294 import re 295 import ast 296 from meerschaum.plugins import sync_plugins_symlinks 297 from meerschaum.utils.packages import attempt_import, reload_meerschaum 298 from meerschaum.utils.venv import init_venv 299 from meerschaum.utils.misc import safely_extract_tar 300 old_cwd = os.getcwd() 301 old_version = '' 302 new_version = '' 303 temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name 304 temp_dir.mkdir(exist_ok=True) 305 306 if not self.archive_path.exists(): 307 return False, f"Missing archive file for plugin '{self}'." 308 if self.version is not None: 309 old_version = self.version 310 if debug: 311 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 312 313 if debug: 314 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 315 316 try: 317 with tarfile.open(self.archive_path, 'r:gz') as tarf: 318 safely_extract_tar(tarf, temp_dir) 319 except Exception as e: 320 warn(e) 321 return False, f"Failed to extract plugin '{self.name}'." 322 323 ### search for version information 324 files = os.listdir(temp_dir) 325 326 if str(files[0]) == self.name: 327 is_dir = True 328 elif str(files[0]) == self.name + '.py': 329 is_dir = False 330 else: 331 error(f"Unknown format encountered for plugin '{self}'.") 332 333 fpath = temp_dir / files[0] 334 if is_dir: 335 fpath = fpath / '__init__.py' 336 337 init_venv(self.name, debug=debug) 338 with open(fpath, 'r', encoding='utf-8') as f: 339 init_lines = f.readlines() 340 new_version = None 341 for line in init_lines: 342 if '__version__' not in line: 343 continue 344 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 345 if not version_match: 346 continue 347 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 348 break 349 if not new_version: 350 warn( 351 f"No `__version__` defined for plugin '{self}'. " 352 + "Assuming new version...", 353 stack = False, 354 ) 355 356 packaging_version = attempt_import('packaging.version') 357 try: 358 is_new_version = (not new_version and not old_version) or ( 359 packaging_version.parse(old_version) < packaging_version.parse(new_version) 360 ) 361 is_same_version = new_version and old_version and ( 362 packaging_version.parse(old_version) == packaging_version.parse(new_version) 363 ) 364 except Exception: 365 is_new_version, is_same_version = True, False 366 367 ### Determine where to permanently store the new plugin. 368 plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0] 369 for path in paths.PLUGINS_DIR_PATHS: 370 if not path.exists(): 371 warn(f"Plugins path does not exist: {path}", stack=False) 372 continue 373 374 files_in_plugins_dir = os.listdir(path) 375 if ( 376 self.name in files_in_plugins_dir 377 or 378 (self.name + '.py') in files_in_plugins_dir 379 ): 380 plugin_installation_dir_path = path 381 break 382 383 success_msg = ( 384 f"Successfully installed plugin '{self}'" 385 + ("\n (skipped dependencies)" if skip_deps else "") 386 + "." 387 ) 388 success, abort = None, None 389 390 if is_same_version and not force: 391 success, msg = True, ( 392 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 393 " Install again with `-f` or `--force` to reinstall." 394 ) 395 abort = True 396 elif is_new_version or force: 397 for src_dir, dirs, files in os.walk(temp_dir): 398 if success is not None: 399 break 400 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 401 if not os.path.exists(dst_dir): 402 os.mkdir(dst_dir) 403 for f in files: 404 src_file = os.path.join(src_dir, f) 405 dst_file = os.path.join(dst_dir, f) 406 if os.path.exists(dst_file): 407 os.remove(dst_file) 408 409 if debug: 410 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 411 try: 412 shutil.move(src_file, dst_dir) 413 except Exception: 414 success, msg = False, ( 415 f"Failed to install plugin '{self}': " + 416 f"Could not move file '{src_file}' to '{dst_dir}'" 417 ) 418 print(msg) 419 break 420 if success is None: 421 success, msg = True, success_msg 422 else: 423 success, msg = False, ( 424 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 425 + f"attempted version {new_version}." 426 ) 427 428 shutil.rmtree(temp_dir) 429 os.chdir(old_cwd) 430 431 ### Reload the plugin's module. 432 sync_plugins_symlinks(debug=debug) 433 if '_module' in self.__dict__: 434 del self.__dict__['_module'] 435 init_venv(venv=self.name, force=True, debug=debug) 436 reload_meerschaum(debug=debug) 437 438 ### if we've already failed, return here 439 if not success or abort: 440 _ongoing_installations.remove(self.full_name) 441 return success, msg 442 443 ### attempt to install dependencies 444 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 445 if not dependencies_installed: 446 _ongoing_installations.remove(self.full_name) 447 return False, f"Failed to install dependencies for plugin '{self}'." 448 449 ### handling success tuple, bool, or other (typically None) 450 setup_tuple = self.setup(debug=debug) 451 if isinstance(setup_tuple, tuple): 452 if not setup_tuple[0]: 453 success, msg = setup_tuple 454 elif isinstance(setup_tuple, bool): 455 if not setup_tuple: 456 success, msg = False, ( 457 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 458 f"Check `setup()` in '{self.__file__}' for more information " + 459 "(no error message provided)." 460 ) 461 else: 462 success, msg = True, success_msg 463 elif setup_tuple is None: 464 success = True 465 msg = ( 466 f"Post-install for plugin '{self}' returned None. " + 467 "Assuming plugin successfully installed." 468 ) 469 warn(msg) 470 else: 471 success = False 472 msg = ( 473 f"Post-install for plugin '{self}' returned unexpected value " + 474 f"of type '{type(setup_tuple)}': {setup_tuple}" 475 ) 476 477 _ongoing_installations.remove(self.full_name) 478 _ = self.module 479 return success, msg
Extract a plugin's tar archive to the plugins directory.
This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.
Parameters
- skip_deps (bool, default False):
If
True, do not install dependencies. - force (bool, default False):
If
True, continue with installation, even if required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool) and a message (str).
482 def remove_archive( 483 self, 484 debug: bool = False 485 ) -> SuccessTuple: 486 """Remove a plugin's archive file.""" 487 if not self.archive_path.exists(): 488 return True, f"Archive file for plugin '{self}' does not exist." 489 try: 490 self.archive_path.unlink() 491 except Exception as e: 492 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 493 return True, "Success"
Remove a plugin's archive file.
496 def remove_venv( 497 self, 498 debug: bool = False 499 ) -> SuccessTuple: 500 """Remove a plugin's virtual environment.""" 501 if not self.venv_path.exists(): 502 return True, f"Virtual environment for plugin '{self}' does not exist." 503 try: 504 shutil.rmtree(self.venv_path) 505 except Exception as e: 506 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 507 return True, "Success"
Remove a plugin's virtual environment.
510 def uninstall(self, debug: bool = False) -> SuccessTuple: 511 """ 512 Remove a plugin, its virtual environment, and archive file. 513 """ 514 from meerschaum.utils.packages import reload_meerschaum 515 from meerschaum.plugins import sync_plugins_symlinks 516 from meerschaum.utils.warnings import warn, info 517 warnings_thrown_count: int = 0 518 max_warnings: int = 3 519 520 if not self.is_installed(): 521 info( 522 f"Plugin '{self.name}' doesn't seem to be installed.\n " 523 + "Checking for artifacts...", 524 stack = False, 525 ) 526 else: 527 real_path = pathlib.Path(os.path.realpath(self.__file__)) 528 try: 529 if real_path.name == '__init__.py': 530 shutil.rmtree(real_path.parent) 531 else: 532 real_path.unlink() 533 except Exception as e: 534 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 535 warnings_thrown_count += 1 536 else: 537 info(f"Removed source files for plugin '{self.name}'.") 538 539 if self.venv_path.exists(): 540 success, msg = self.remove_venv(debug=debug) 541 if not success: 542 warn(msg, stack=False) 543 warnings_thrown_count += 1 544 else: 545 info(f"Removed virtual environment from plugin '{self.name}'.") 546 547 success = warnings_thrown_count < max_warnings 548 sync_plugins_symlinks(debug=debug) 549 self.deactivate_venv(force=True, debug=debug) 550 reload_meerschaum(debug=debug) 551 return success, ( 552 f"Successfully uninstalled plugin '{self}'." if success 553 else f"Failed to uninstall plugin '{self}'." 554 )
Remove a plugin, its virtual environment, and archive file.
557 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 558 """ 559 If exists, run the plugin's `setup()` function. 560 561 Parameters 562 ---------- 563 *args: str 564 The positional arguments passed to the `setup()` function. 565 566 debug: bool, default False 567 Verbosity toggle. 568 569 **kw: Any 570 The keyword arguments passed to the `setup()` function. 571 572 Returns 573 ------- 574 A `SuccessTuple` or `bool` indicating success. 575 576 """ 577 from meerschaum.utils.debug import dprint 578 import inspect 579 _setup = None 580 for name, fp in inspect.getmembers(self.module): 581 if name == 'setup' and inspect.isfunction(fp): 582 _setup = fp 583 break 584 585 ### assume success if no setup() is found (not necessary) 586 if _setup is None: 587 return True 588 589 sig = inspect.signature(_setup) 590 has_debug, has_kw = ('debug' in sig.parameters), False 591 for k, v in sig.parameters.items(): 592 if '**' in str(v): 593 has_kw = True 594 break 595 596 _kw = {} 597 if has_kw: 598 _kw.update(kw) 599 if has_debug: 600 _kw['debug'] = debug 601 602 if debug: 603 dprint(f"Running setup for plugin '{self}'...") 604 try: 605 self.activate_venv(debug=debug) 606 return_tuple = _setup(*args, **_kw) 607 self.deactivate_venv(debug=debug) 608 except Exception as e: 609 return False, str(e) 610 611 if isinstance(return_tuple, tuple): 612 return return_tuple 613 if isinstance(return_tuple, bool): 614 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 615 if return_tuple is None: 616 return False, f"Setup for Plugin '{self.name}' returned None." 617 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
If exists, run the plugin's setup() function.
Parameters
- *args (str):
The positional arguments passed to the
setup()function. - debug (bool, default False): Verbosity toggle.
- **kw (Any):
The keyword arguments passed to the
setup()function.
Returns
- A
SuccessTupleorboolindicating success.
620 def get_dependencies( 621 self, 622 debug: bool = False, 623 ) -> List[str]: 624 """ 625 If the Plugin has specified dependencies in a list called `required`, return the list. 626 627 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 628 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 629 630 Parameters 631 ---------- 632 debug: bool, default False 633 Verbosity toggle. 634 635 Returns 636 ------- 637 A list of required packages and plugins (str). 638 639 """ 640 if '_required' in self.__dict__: 641 return self._required 642 643 ### If the plugin has not yet been imported, 644 ### infer the dependencies from the source text. 645 ### This is not super robust, and it doesn't feel right 646 ### having multiple versions of the logic. 647 ### This is necessary when determining the activation order 648 ### without having import the module. 649 ### For consistency's sake, the module-less method does not cache the requirements. 650 if self.__dict__.get('_module', None) is None: 651 file_path = self.__file__ 652 if file_path is None: 653 return [] 654 with open(file_path, 'r', encoding='utf-8') as f: 655 text = f.read() 656 657 if 'required' not in text: 658 return [] 659 660 ### This has some limitations: 661 ### It relies on `required` being manually declared. 662 ### We lose the ability to dynamically alter the `required` list, 663 ### which is why we've kept the module-reliant method below. 664 import ast, re 665 ### NOTE: This technically would break 666 ### if `required` was the very first line of the file. 667 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 668 if not req_start_match: 669 return [] 670 req_start = req_start_match.start() 671 equals_sign = req_start + text[req_start:].find('=') 672 673 ### Dependencies may have brackets within the strings, so push back the index. 674 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 675 if first_opening_brace == -1: 676 return [] 677 678 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 679 if next_closing_brace == -1: 680 return [] 681 682 start_ix = first_opening_brace + 1 683 end_ix = next_closing_brace 684 685 num_braces = 0 686 while True: 687 if '[' not in text[start_ix:end_ix]: 688 break 689 num_braces += 1 690 start_ix = end_ix 691 end_ix += text[end_ix + 1:].find(']') + 1 692 693 req_end = end_ix + 1 694 req_text = ( 695 text[(first_opening_brace-1):req_end] 696 .lstrip() 697 .replace('=', '', 1) 698 .lstrip() 699 .rstrip() 700 ) 701 try: 702 required = ast.literal_eval(req_text) 703 except Exception as e: 704 warn( 705 f"Unable to determine requirements for plugin '{self.name}' " 706 + "without importing the module.\n" 707 + " This may be due to dynamically setting the global `required` list.\n" 708 + f" {e}" 709 ) 710 return [] 711 return required 712 713 import inspect 714 self.activate_venv(dependencies=False, debug=debug) 715 required = [] 716 for name, val in inspect.getmembers(self.module): 717 if name == 'required': 718 required = val 719 break 720 self._required = required 721 self.deactivate_venv(dependencies=False, debug=debug) 722 return required
If the Plugin has specified dependencies in a list called required, return the list.
NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages.
Meerschaum plugins may also specify connector keys for a repo after '@'.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A list of required packages and plugins (str).
725 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 726 """ 727 Return a list of required Plugin objects. 728 """ 729 from meerschaum.utils.warnings import warn 730 from meerschaum.config import get_config 731 from meerschaum._internal.static import STATIC_CONFIG 732 from meerschaum.connectors.parse import is_valid_connector_keys 733 plugins = [] 734 _deps = self.get_dependencies(debug=debug) 735 sep = STATIC_CONFIG['plugins']['repo_separator'] 736 plugin_names = [ 737 _d[len('plugin:'):] for _d in _deps 738 if _d.startswith('plugin:') and len(_d) > len('plugin:') 739 ] 740 default_repo_keys = get_config('meerschaum', 'repository') 741 skipped_repo_keys = set() 742 743 for _plugin_name in plugin_names: 744 if sep in _plugin_name: 745 try: 746 _plugin_name, _repo_keys = _plugin_name.split(sep) 747 except Exception: 748 _repo_keys = default_repo_keys 749 warn( 750 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 751 + f"Will try to use '{_repo_keys}' instead.", 752 stack = False, 753 ) 754 else: 755 _repo_keys = default_repo_keys 756 757 if _repo_keys in skipped_repo_keys: 758 continue 759 760 if not is_valid_connector_keys(_repo_keys): 761 warn( 762 f"Invalid connector '{_repo_keys}'.\n" 763 f" Skipping required plugins from repository '{_repo_keys}'", 764 stack=False, 765 ) 766 continue 767 768 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 769 770 return plugins
Return a list of required Plugin objects.
773 def get_required_packages(self, debug: bool=False) -> List[str]: 774 """ 775 Return the required package names (excluding plugins). 776 """ 777 _deps = self.get_dependencies(debug=debug) 778 return [_d for _d in _deps if not _d.startswith('plugin:')]
Return the required package names (excluding plugins).
781 def activate_venv( 782 self, 783 dependencies: bool = True, 784 init_if_not_exists: bool = True, 785 debug: bool = False, 786 **kw 787 ) -> bool: 788 """ 789 Activate the virtual environments for the plugin and its dependencies. 790 791 Parameters 792 ---------- 793 dependencies: bool, default True 794 If `True`, activate the virtual environments for required plugins. 795 796 Returns 797 ------- 798 A bool indicating success. 799 """ 800 import meerschaum.config.paths as paths 801 from meerschaum.utils.venv import venv_target_path 802 from meerschaum.utils.packages import activate_venv 803 from meerschaum.utils.misc import make_symlink, is_symlink 804 805 if dependencies: 806 for plugin in self.get_required_plugins(debug=debug): 807 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 808 809 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 810 venv_meerschaum_path = vtp / 'meerschaum' 811 812 try: 813 success, msg = True, "Success" 814 if is_symlink(venv_meerschaum_path): 815 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH: 816 venv_meerschaum_path.unlink() 817 success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH) 818 except Exception as e: 819 success, msg = False, str(e) 820 if not success: 821 warn( 822 f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n" 823 f"{msg}" 824 ) 825 826 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
Activate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, activate the virtual environments for required plugins.
Returns
- A bool indicating success.
829 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 830 """ 831 Deactivate the virtual environments for the plugin and its dependencies. 832 833 Parameters 834 ---------- 835 dependencies: bool, default True 836 If `True`, deactivate the virtual environments for required plugins. 837 838 Returns 839 ------- 840 A bool indicating success. 841 """ 842 from meerschaum.utils.packages import deactivate_venv 843 success = deactivate_venv(self.name, debug=debug, **kw) 844 if dependencies: 845 for plugin in self.get_required_plugins(debug=debug): 846 plugin.deactivate_venv(debug=debug, **kw) 847 return success
Deactivate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, deactivate the virtual environments for required plugins.
Returns
- A bool indicating success.
850 def install_dependencies( 851 self, 852 force: bool = False, 853 debug: bool = False, 854 ) -> bool: 855 """ 856 If specified, install dependencies. 857 858 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 859 Meerschaum plugins from the same repository as this Plugin. 860 To install from a different repository, add the repo keys after `'@'` 861 (e.g. `'plugin:foo@api:bar'`). 862 863 Parameters 864 ---------- 865 force: bool, default False 866 If `True`, continue with the installation, even if some 867 required packages fail to install. 868 869 debug: bool, default False 870 Verbosity toggle. 871 872 Returns 873 ------- 874 A bool indicating success. 875 """ 876 from meerschaum.utils.packages import pip_install, venv_contains_package 877 from meerschaum.utils.warnings import warn, info 878 _deps = self.get_dependencies(debug=debug) 879 if not _deps and self.requirements_file_path is None: 880 return True 881 882 plugins = self.get_required_plugins(debug=debug) 883 for _plugin in plugins: 884 if _plugin.name == self.name: 885 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 886 continue 887 _success, _msg = _plugin.repo_connector.install_plugin( 888 _plugin.name, debug=debug, force=force 889 ) 890 if not _success: 891 warn( 892 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 893 + f" for plugin '{self.name}':\n" + _msg, 894 stack = False, 895 ) 896 if not force: 897 warn( 898 "Try installing with the `--force` flag to continue anyway.", 899 stack = False, 900 ) 901 return False 902 info( 903 "Continuing with installation despite the failure " 904 + "(careful, things might be broken!)...", 905 icon = False 906 ) 907 908 909 ### First step: parse `requirements.txt` if it exists. 910 if self.requirements_file_path is not None: 911 if not pip_install( 912 requirements_file_path=self.requirements_file_path, 913 venv=self.name, debug=debug 914 ): 915 warn( 916 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 917 stack = False, 918 ) 919 if not force: 920 warn( 921 "Try installing with `--force` to continue anyway.", 922 stack = False, 923 ) 924 return False 925 info( 926 "Continuing with installation despite the failure " 927 + "(careful, things might be broken!)...", 928 icon = False 929 ) 930 931 932 ### Don't reinstall packages that are already included in required plugins. 933 packages = [] 934 _packages = self.get_required_packages(debug=debug) 935 accounted_for_packages = set() 936 for package_name in _packages: 937 for plugin in plugins: 938 if venv_contains_package(package_name, plugin.name): 939 accounted_for_packages.add(package_name) 940 break 941 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 942 943 ### Attempt pip packages installation. 944 if packages: 945 for package in packages: 946 if not pip_install(package, venv=self.name, debug=debug): 947 warn( 948 f"Failed to install required package '{package}'" 949 + f" for plugin '{self.name}'.", 950 stack = False, 951 ) 952 if not force: 953 warn( 954 "Try installing with `--force` to continue anyway.", 955 stack = False, 956 ) 957 return False 958 info( 959 "Continuing with installation despite the failure " 960 + "(careful, things might be broken!)...", 961 icon = False 962 ) 963 return True
If specified, install dependencies.
NOTE: Dependencies that start with 'plugin:' will be installed as
Meerschaum plugins from the same repository as this Plugin.
To install from a different repository, add the repo keys after '@'
(e.g. 'plugin:foo@api:bar').
Parameters
- force (bool, default False):
If
True, continue with the installation, even if some required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A bool indicating success.
966 @property 967 def full_name(self) -> str: 968 """ 969 Include the repo keys with the plugin's name. 970 """ 971 from meerschaum._internal.static import STATIC_CONFIG 972 sep = STATIC_CONFIG['plugins']['repo_separator'] 973 return self.name + sep + str(self.repo_connector)
Include the repo keys with the plugin's name.
19class Venv: 20 """ 21 Manage a virtual enviroment's activation status. 22 23 Examples 24 -------- 25 >>> from meerschaum.plugins import Plugin 26 >>> with Venv('mrsm') as venv: 27 ... import pandas 28 >>> with Venv(Plugin('noaa')) as venv: 29 ... import requests 30 >>> venv = Venv('mrsm') 31 >>> venv.activate() 32 True 33 >>> venv.deactivate() 34 True 35 >>> 36 """ 37 38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 61 62 63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 ) 86 87 88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs) 95 96 97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug) 106 107 108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 import meerschaum.config.paths as paths 114 if self._venv is None: 115 return self.target_path.parent 116 return paths.VIRTENV_RESOURCES_PATH / self._venv 117 118 119 def __enter__(self) -> None: 120 self.activate(debug=self._debug) 121 122 123 def __exit__(self, exc_type, exc_value, exc_traceback) -> None: 124 self.deactivate(debug=self._debug) 125 126 127 def __str__(self) -> str: 128 quote = "'" if self._venv is not None else "" 129 return "Venv(" + quote + str(self._venv) + quote + ")" 130 131 132 def __repr__(self) -> str: 133 return self.__str__()
Manage a virtual enviroment's activation status.
Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
... import pandas
>>> with Venv(Plugin('noaa')) as venv:
... import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 )
Activate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be activated.
88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs)
Deactivate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be deactivated.
97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
Return the target site-packages path for this virtual environment.
A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version
(e.g. Python 3.10 and Python 3.7).
108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 import meerschaum.config.paths as paths 114 if self._venv is None: 115 return self.target_path.parent 116 return paths.VIRTENV_RESOURCES_PATH / self._venv
Return the top-level path for this virtual environment.
70class Job: 71 """ 72 Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API. 73 """ 74 75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break 202 203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 psutil = mrsm.attempt_import('psutil') 217 try: 218 process = psutil.Process(pid) 219 except psutil.NoSuchProcess as e: 220 warn(f"Process with PID {pid} does not exist.", stack=False) 221 raise e 222 223 command_args = process.cmdline() 224 is_daemon = command_args[1] == '-c' 225 226 if is_daemon: 227 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 228 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 229 if root_dir is None: 230 root_dir = paths.ROOT_DIR_PATH 231 else: 232 root_dir = pathlib.Path(root_dir) 233 jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name 234 daemon_dir = jobs_dir / daemon_id 235 pid_file = daemon_dir / 'process.pid' 236 237 if pid_file.exists(): 238 with open(pid_file, 'r', encoding='utf-8') as f: 239 daemon_pid = int(f.read()) 240 241 if pid != daemon_pid: 242 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 243 else: 244 raise EnvironmentError(f"Is job '{daemon_id}' running?") 245 246 return Job(daemon_id, executor_keys=executor_keys) 247 248 from meerschaum._internal.arguments._parse_arguments import parse_arguments 249 from meerschaum.utils.daemon import get_new_daemon_name 250 251 mrsm_ix = 0 252 for i, arg in enumerate(command_args): 253 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 254 mrsm_ix = i 255 break 256 257 sysargs = command_args[mrsm_ix+1:] 258 kwargs = parse_arguments(sysargs) 259 name = kwargs.get('name', get_new_daemon_name()) 260 return Job(name, sysargs, executor_keys=executor_keys) 261 262 def start(self, debug: bool = False) -> SuccessTuple: 263 """ 264 Start the job's daemon. 265 """ 266 if self.executor is not None: 267 if not self.exists(debug=debug): 268 return self.executor.create_job( 269 self.name, 270 self.sysargs, 271 properties=self.daemon.properties, 272 debug=debug, 273 ) 274 return self.executor.start_job(self.name, debug=debug) 275 276 if self.is_running(): 277 return True, f"{self} is already running." 278 279 success, msg = self.daemon.run( 280 keep_daemon_output=(not self.delete_after_completion), 281 allow_dirty_run=True, 282 ) 283 if not success: 284 return success, msg 285 286 return success, f"Started {self}." 287 288 def stop( 289 self, 290 timeout_seconds: Union[int, float, None] = None, 291 debug: bool = False, 292 ) -> SuccessTuple: 293 """ 294 Stop the job's daemon. 295 """ 296 if self.executor is not None: 297 return self.executor.stop_job(self.name, debug=debug) 298 299 if self.daemon.status == 'stopped': 300 if not self.restart: 301 return True, f"{self} is not running." 302 elif self.stop_time is not None: 303 return True, f"{self} will not restart until manually started." 304 305 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 306 if quit_success: 307 return quit_success, f"Stopped {self}." 308 309 warn( 310 f"Failed to gracefully quit {self}.", 311 stack=False, 312 ) 313 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 314 if not kill_success: 315 return kill_success, kill_msg 316 317 return kill_success, f"Killed {self}." 318 319 def pause( 320 self, 321 timeout_seconds: Union[int, float, None] = None, 322 debug: bool = False, 323 ) -> SuccessTuple: 324 """ 325 Pause the job's daemon. 326 """ 327 if self.executor is not None: 328 return self.executor.pause_job(self.name, debug=debug) 329 330 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 331 if not pause_success: 332 return pause_success, pause_msg 333 334 return pause_success, f"Paused {self}." 335 336 def delete(self, debug: bool = False) -> SuccessTuple: 337 """ 338 Delete the job and its daemon. 339 """ 340 if self.executor is not None: 341 return self.executor.delete_job(self.name, debug=debug) 342 343 if self.is_running(): 344 stop_success, stop_msg = self.stop() 345 if not stop_success: 346 return stop_success, stop_msg 347 348 cleanup_success, cleanup_msg = self.daemon.cleanup() 349 if not cleanup_success: 350 return cleanup_success, cleanup_msg 351 352 _ = self.daemon._properties.pop('result', None) 353 return cleanup_success, f"Deleted {self}." 354 355 def is_running(self) -> bool: 356 """ 357 Determine whether the job's daemon is running. 358 """ 359 return self.status == 'running' 360 361 def exists(self, debug: bool = False) -> bool: 362 """ 363 Determine whether the job exists. 364 """ 365 if self.executor is not None: 366 return self.executor.get_job_exists(self.name, debug=debug) 367 368 return self.daemon.path.exists() 369 370 def get_logs(self) -> Union[str, None]: 371 """ 372 Return the output text of the job's daemon. 373 """ 374 if self.executor is not None: 375 return self.executor.get_logs(self.name) 376 377 return self.daemon.log_text 378 379 def monitor_logs( 380 self, 381 callback_function: Callable[[str], None] = _default_stdout_callback, 382 input_callback_function: Optional[Callable[[], str]] = None, 383 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 384 stop_event: Optional[asyncio.Event] = None, 385 stop_on_exit: bool = False, 386 strip_timestamps: bool = False, 387 accept_input: bool = True, 388 debug: bool = False, 389 _logs_path: Optional[pathlib.Path] = None, 390 _log=None, 391 _stdin_file=None, 392 _wait_if_stopped: bool = True, 393 ): 394 """ 395 Monitor the job's log files and execute a callback on new lines. 396 397 Parameters 398 ---------- 399 callback_function: Callable[[str], None], default partial(print, end='') 400 The callback to execute as new data comes in. 401 Defaults to printing the output directly to `stdout`. 402 403 input_callback_function: Optional[Callable[[], str]], default None 404 If provided, execute this callback when the daemon is blocking on stdin. 405 Defaults to `sys.stdin.readline()`. 406 407 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 408 If provided, execute this callback when the daemon stops. 409 The job's SuccessTuple will be passed to the callback. 410 411 stop_event: Optional[asyncio.Event], default None 412 If provided, stop monitoring when this event is set. 413 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 414 from within `callback_function` to stop monitoring. 415 416 stop_on_exit: bool, default False 417 If `True`, stop monitoring when the job stops. 418 419 strip_timestamps: bool, default False 420 If `True`, remove leading timestamps from lines. 421 422 accept_input: bool, default True 423 If `True`, accept input when the daemon blocks on stdin. 424 """ 425 if self.executor is not None: 426 self.executor.monitor_logs( 427 self.name, 428 callback_function, 429 input_callback_function=input_callback_function, 430 stop_callback_function=stop_callback_function, 431 stop_on_exit=stop_on_exit, 432 accept_input=accept_input, 433 strip_timestamps=strip_timestamps, 434 debug=debug, 435 ) 436 return 437 438 monitor_logs_coroutine = self.monitor_logs_async( 439 callback_function=callback_function, 440 input_callback_function=input_callback_function, 441 stop_callback_function=stop_callback_function, 442 stop_event=stop_event, 443 stop_on_exit=stop_on_exit, 444 strip_timestamps=strip_timestamps, 445 accept_input=accept_input, 446 debug=debug, 447 _logs_path=_logs_path, 448 _log=_log, 449 _stdin_file=_stdin_file, 450 _wait_if_stopped=_wait_if_stopped, 451 ) 452 return asyncio.run(monitor_logs_coroutine) 453 454 async def monitor_logs_async( 455 self, 456 callback_function: Callable[[str], None] = _default_stdout_callback, 457 input_callback_function: Optional[Callable[[], str]] = None, 458 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 459 stop_event: Optional[asyncio.Event] = None, 460 stop_on_exit: bool = False, 461 strip_timestamps: bool = False, 462 accept_input: bool = True, 463 debug: bool = False, 464 _logs_path: Optional[pathlib.Path] = None, 465 _log=None, 466 _stdin_file=None, 467 _wait_if_stopped: bool = True, 468 ): 469 """ 470 Monitor the job's log files and await a callback on new lines. 471 472 Parameters 473 ---------- 474 callback_function: Callable[[str], None], default _default_stdout_callback 475 The callback to execute as new data comes in. 476 Defaults to printing the output directly to `stdout`. 477 478 input_callback_function: Optional[Callable[[], str]], default None 479 If provided, execute this callback when the daemon is blocking on stdin. 480 Defaults to `sys.stdin.readline()`. 481 482 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 483 If provided, execute this callback when the daemon stops. 484 The job's SuccessTuple will be passed to the callback. 485 486 stop_event: Optional[asyncio.Event], default None 487 If provided, stop monitoring when this event is set. 488 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 489 from within `callback_function` to stop monitoring. 490 491 stop_on_exit: bool, default False 492 If `True`, stop monitoring when the job stops. 493 494 strip_timestamps: bool, default False 495 If `True`, remove leading timestamps from lines. 496 497 accept_input: bool, default True 498 If `True`, accept input when the daemon blocks on stdin. 499 """ 500 from meerschaum.utils.prompt import prompt 501 502 def default_input_callback_function(): 503 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 504 if prompt_kwargs: 505 answer = prompt(**prompt_kwargs) 506 return answer + '\n' 507 return sys.stdin.readline() 508 509 if input_callback_function is None: 510 input_callback_function = default_input_callback_function 511 512 if self.executor is not None: 513 await self.executor.monitor_logs_async( 514 self.name, 515 callback_function, 516 input_callback_function=input_callback_function, 517 stop_callback_function=stop_callback_function, 518 stop_on_exit=stop_on_exit, 519 strip_timestamps=strip_timestamps, 520 accept_input=accept_input, 521 debug=debug, 522 ) 523 return 524 525 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 526 527 events = { 528 'user': stop_event, 529 'stopped': asyncio.Event(), 530 'stop_token': asyncio.Event(), 531 'stop_exception': asyncio.Event(), 532 'stopped_timeout': asyncio.Event(), 533 } 534 combined_event = asyncio.Event() 535 emitted_text = False 536 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 537 538 async def check_job_status(): 539 if not stop_on_exit: 540 return 541 542 nonlocal emitted_text 543 544 sleep_time = 0.1 545 while sleep_time < 0.2: 546 if self.status == 'stopped': 547 if not emitted_text and _wait_if_stopped: 548 await asyncio.sleep(sleep_time) 549 sleep_time = round(sleep_time * 1.1, 3) 550 continue 551 552 if stop_callback_function is not None: 553 try: 554 if asyncio.iscoroutinefunction(stop_callback_function): 555 await stop_callback_function(self.result) 556 else: 557 stop_callback_function(self.result) 558 except asyncio.exceptions.CancelledError: 559 break 560 except Exception: 561 warn(traceback.format_exc()) 562 563 if stop_on_exit: 564 events['stopped'].set() 565 566 break 567 await asyncio.sleep(0.1) 568 569 events['stopped_timeout'].set() 570 571 async def check_blocking_on_input(): 572 while True: 573 if not emitted_text or not self.is_blocking_on_stdin(): 574 try: 575 await asyncio.sleep(self.refresh_seconds) 576 except asyncio.exceptions.CancelledError: 577 break 578 continue 579 580 if not self.is_running(): 581 break 582 583 await emit_latest_lines() 584 585 try: 586 print('', end='', flush=True) 587 if asyncio.iscoroutinefunction(input_callback_function): 588 data = await input_callback_function() 589 else: 590 loop = asyncio.get_running_loop() 591 data = await loop.run_in_executor(None, input_callback_function) 592 except KeyboardInterrupt: 593 break 594 # if not data.endswith('\n'): 595 # data += '\n' 596 597 stdin_file.write(data) 598 await asyncio.sleep(self.refresh_seconds) 599 600 async def combine_events(): 601 event_tasks = [ 602 asyncio.create_task(event.wait()) 603 for event in events.values() 604 if event is not None 605 ] 606 if not event_tasks: 607 return 608 609 try: 610 done, pending = await asyncio.wait( 611 event_tasks, 612 return_when=asyncio.FIRST_COMPLETED, 613 ) 614 for task in pending: 615 task.cancel() 616 except asyncio.exceptions.CancelledError: 617 pass 618 finally: 619 combined_event.set() 620 621 check_job_status_task = asyncio.create_task(check_job_status()) 622 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 623 combine_events_task = asyncio.create_task(combine_events()) 624 625 log = _log if _log is not None else self.daemon.rotating_log 626 lines_to_show = ( 627 self.daemon.properties.get( 628 'logs', {} 629 ).get( 630 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 631 ) 632 ) 633 634 async def emit_latest_lines(): 635 nonlocal emitted_text 636 nonlocal stop_event 637 lines = log.readlines() 638 for line in lines[(-1 * lines_to_show):]: 639 if stop_event is not None and stop_event.is_set(): 640 return 641 642 line_stripped_extra = strip_timestamp_from_line(line.strip()) 643 line_stripped = strip_timestamp_from_line(line) 644 645 if line_stripped_extra == STOP_TOKEN: 646 events['stop_token'].set() 647 return 648 649 if line_stripped_extra == CLEAR_TOKEN: 650 clear_screen(debug=debug) 651 continue 652 653 if line_stripped_extra == FLUSH_TOKEN.strip(): 654 line_stripped = '' 655 line = '' 656 657 if strip_timestamps: 658 line = line_stripped 659 660 try: 661 if asyncio.iscoroutinefunction(callback_function): 662 await callback_function(line) 663 else: 664 callback_function(line) 665 emitted_text = True 666 except StopMonitoringLogs: 667 events['stop_exception'].set() 668 return 669 except Exception: 670 warn(f"Error in logs callback:\n{traceback.format_exc()}") 671 672 await emit_latest_lines() 673 674 tasks = ( 675 [check_job_status_task] 676 + ([check_blocking_on_input_task] if accept_input else []) 677 + [combine_events_task] 678 ) 679 try: 680 _ = asyncio.gather(*tasks, return_exceptions=True) 681 except asyncio.exceptions.CancelledError: 682 raise 683 except Exception: 684 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 685 686 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 687 dir_path_to_monitor = ( 688 _logs_path 689 or (log.file_path.parent if log else None) 690 or paths.LOGS_RESOURCES_PATH 691 ) 692 async for changes in watchfiles.awatch( 693 dir_path_to_monitor, 694 stop_event=combined_event, 695 ): 696 for change in changes: 697 file_path_str = change[1] 698 file_path = pathlib.Path(file_path_str) 699 latest_subfile_path = log.get_latest_subfile_path() 700 if latest_subfile_path != file_path: 701 continue 702 703 await emit_latest_lines() 704 705 await emit_latest_lines() 706 707 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 708 """ 709 Return whether a job's daemon is blocking on stdin. 710 """ 711 if self.executor is not None: 712 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 713 714 return self.is_running() and self.daemon.blocking_stdin_file_path.exists() 715 716 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 717 """ 718 Return the kwargs to the blocking `prompt()`, if available. 719 """ 720 if self.executor is not None: 721 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 722 723 if not self.daemon.prompt_kwargs_file_path.exists(): 724 return {} 725 726 try: 727 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 728 prompt_kwargs = json.load(f) 729 730 return prompt_kwargs 731 732 except Exception: 733 import traceback 734 traceback.print_exc() 735 return {} 736 737 def write_stdin(self, data): 738 """ 739 Write to a job's daemon's `stdin`. 740 """ 741 self.daemon.stdin_file.write(data) 742 743 @property 744 def executor(self) -> Union[Executor, None]: 745 """ 746 If the job is remote, return the connector to the remote API instance. 747 """ 748 return ( 749 mrsm.get_connector(self.executor_keys) 750 if self.executor_keys != 'local' 751 else None 752 ) 753 754 @property 755 def status(self) -> str: 756 """ 757 Return the running status of the job's daemon. 758 """ 759 if '_status_hook' in self.__dict__: 760 return self._status_hook() 761 762 if self.executor is not None: 763 return self.executor.get_job_status(self.name) 764 765 return self.daemon.status 766 767 @property 768 def pid(self) -> Union[int, None]: 769 """ 770 Return the PID of the job's dameon. 771 """ 772 if self.executor is not None: 773 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 774 775 return self.daemon.pid 776 777 @property 778 def restart(self) -> bool: 779 """ 780 Return whether to restart a stopped job. 781 """ 782 if self.executor is not None: 783 return self.executor.get_job_metadata(self.name).get('restart', False) 784 785 return self.daemon.properties.get('restart', False) 786 787 @property 788 def result(self) -> SuccessTuple: 789 """ 790 Return the `SuccessTuple` when the job has terminated. 791 """ 792 if self.is_running(): 793 return True, f"{self} is running." 794 795 if '_result_hook' in self.__dict__: 796 return self._result_hook() 797 798 if self.executor is not None: 799 return ( 800 self.executor.get_job_metadata(self.name) 801 .get('result', (False, "No result available.")) 802 ) 803 804 _result = self.daemon.properties.get('result', None) 805 if _result is None: 806 from meerschaum.utils.daemon.Daemon import _results 807 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 808 809 return tuple(_result) 810 811 @property 812 def sysargs(self) -> List[str]: 813 """ 814 Return the sysargs to use for the Daemon. 815 """ 816 if self._sysargs: 817 return self._sysargs 818 819 if self.executor is not None: 820 return self.executor.get_job_metadata(self.name).get('sysargs', []) 821 822 target_args = self.daemon.target_args 823 if target_args is None: 824 return [] 825 self._sysargs = target_args[0] if len(target_args) > 0 else [] 826 return self._sysargs 827 828 def get_daemon_properties(self) -> Dict[str, Any]: 829 """ 830 Return the `properties` dictionary for the job's daemon. 831 """ 832 remote_properties = ( 833 {} 834 if self.executor is None 835 else self.executor.get_job_properties(self.name) 836 ) 837 return { 838 **remote_properties, 839 **self._properties_patch 840 } 841 842 @property 843 def daemon(self) -> 'Daemon': 844 """ 845 Return the daemon which this job manages. 846 """ 847 from meerschaum.utils.daemon import Daemon 848 if self._daemon is not None and self.executor is None and self._sysargs: 849 return self._daemon 850 851 self._daemon = Daemon( 852 target=entry, 853 target_args=[self._sysargs], 854 target_kw={}, 855 daemon_id=self.name, 856 label=shlex.join(self._sysargs), 857 properties=self.get_daemon_properties(), 858 ) 859 if '_rotating_log' in self.__dict__: 860 self._daemon._rotating_log = self._rotating_log 861 862 if '_stdin_file' in self.__dict__: 863 self._daemon._stdin_file = self._stdin_file 864 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 865 866 return self._daemon 867 868 @property 869 def began(self) -> Union[datetime, None]: 870 """ 871 The datetime when the job began running. 872 """ 873 if self.executor is not None: 874 began_str = self.executor.get_job_began(self.name) 875 if began_str is None: 876 return None 877 return ( 878 datetime.fromisoformat(began_str) 879 .astimezone(timezone.utc) 880 .replace(tzinfo=None) 881 ) 882 883 began_str = self.daemon.properties.get('process', {}).get('began', None) 884 if began_str is None: 885 return None 886 887 return datetime.fromisoformat(began_str) 888 889 @property 890 def ended(self) -> Union[datetime, None]: 891 """ 892 The datetime when the job stopped running. 893 """ 894 if self.executor is not None: 895 ended_str = self.executor.get_job_ended(self.name) 896 if ended_str is None: 897 return None 898 return ( 899 datetime.fromisoformat(ended_str) 900 .astimezone(timezone.utc) 901 .replace(tzinfo=None) 902 ) 903 904 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 905 if ended_str is None: 906 return None 907 908 return datetime.fromisoformat(ended_str) 909 910 @property 911 def paused(self) -> Union[datetime, None]: 912 """ 913 The datetime when the job was suspended while running. 914 """ 915 if self.executor is not None: 916 paused_str = self.executor.get_job_paused(self.name) 917 if paused_str is None: 918 return None 919 return ( 920 datetime.fromisoformat(paused_str) 921 .astimezone(timezone.utc) 922 .replace(tzinfo=None) 923 ) 924 925 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 926 if paused_str is None: 927 return None 928 929 return datetime.fromisoformat(paused_str) 930 931 @property 932 def stop_time(self) -> Union[datetime, None]: 933 """ 934 Return the timestamp when the job was manually stopped. 935 """ 936 if self.executor is not None: 937 return self.executor.get_job_stop_time(self.name) 938 939 if not self.daemon.stop_path.exists(): 940 return None 941 942 stop_data = self.daemon._read_stop_file() 943 if not stop_data: 944 return None 945 946 stop_time_str = stop_data.get('stop_time', None) 947 if not stop_time_str: 948 warn(f"Could not read stop time for {self}.") 949 return None 950 951 return datetime.fromisoformat(stop_time_str) 952 953 @property 954 def hidden(self) -> bool: 955 """ 956 Return a bool indicating whether this job should be displayed. 957 """ 958 return ( 959 self.name.startswith('_') 960 or self.name.startswith('.') 961 or self._is_externally_managed 962 ) 963 964 def check_restart(self) -> SuccessTuple: 965 """ 966 If `restart` is `True` and the daemon is not running, 967 restart the job. 968 Do not restart if the job was manually stopped. 969 """ 970 if self.is_running(): 971 return True, f"{self} is running." 972 973 if not self.restart: 974 return True, f"{self} does not need to be restarted." 975 976 if self.stop_time is not None: 977 return True, f"{self} was manually stopped." 978 979 return self.start() 980 981 @property 982 def label(self) -> str: 983 """ 984 Return the job's Daemon label (joined sysargs). 985 """ 986 from meerschaum._internal.arguments import compress_pipeline_sysargs 987 sysargs = compress_pipeline_sysargs(self.sysargs) 988 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip() 989 990 @property 991 def _externally_managed_file(self) -> pathlib.Path: 992 """ 993 Return the path to the externally managed file. 994 """ 995 return self.daemon.path / '.externally-managed' 996 997 def _set_externally_managed(self): 998 """ 999 Set this job as externally managed. 1000 """ 1001 self._externally_managed = True 1002 try: 1003 self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True) 1004 self._externally_managed_file.touch() 1005 except Exception as e: 1006 warn(e) 1007 1008 @property 1009 def _is_externally_managed(self) -> bool: 1010 """ 1011 Return whether this job is externally managed. 1012 """ 1013 return self.executor_keys in (None, 'local') and ( 1014 self._externally_managed or self._externally_managed_file.exists() 1015 ) 1016 1017 @property 1018 def env(self) -> Dict[str, str]: 1019 """ 1020 Return the environment variables to set for the job's process. 1021 """ 1022 if '_env' in self.__dict__: 1023 return self.__dict__['_env'] 1024 1025 _env = self.daemon.properties.get('env', {}) 1026 default_env = { 1027 'PYTHONUNBUFFERED': '1', 1028 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1029 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1030 STATIC_CONFIG['environment']['noninteractive']: 'true', 1031 } 1032 self._env = {**default_env, **_env} 1033 return self._env 1034 1035 @property 1036 def delete_after_completion(self) -> bool: 1037 """ 1038 Return whether this job is configured to delete itself after completion. 1039 """ 1040 if '_delete_after_completion' in self.__dict__: 1041 return self.__dict__.get('_delete_after_completion', False) 1042 1043 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1044 return self._delete_after_completion 1045 1046 def __str__(self) -> str: 1047 sysargs = self.sysargs 1048 sysargs_str = shlex.join(sysargs) if sysargs else '' 1049 job_str = f'Job("{self.name}"' 1050 if sysargs_str: 1051 job_str += f', "{sysargs_str}"' 1052 1053 job_str += ')' 1054 return job_str 1055 1056 def __repr__(self) -> str: 1057 return str(self) 1058 1059 def __hash__(self) -> int: 1060 return hash(self.name)
Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.
75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break
Create a new job to manage a meerschaum.utils.daemon.Daemon.
Parameters
- name (str): The name of the job to be created. This will also be used as the Daemon ID.
- sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
- env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
- executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
- delete_after_completion (bool, default False):
If
True, delete this job when it has finished executing. - refresh_seconds (Union[int, float, None], default None):
The number of seconds to sleep between refreshes.
Defaults to the configured value
system.cli.refresh_seconds. - _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 psutil = mrsm.attempt_import('psutil') 217 try: 218 process = psutil.Process(pid) 219 except psutil.NoSuchProcess as e: 220 warn(f"Process with PID {pid} does not exist.", stack=False) 221 raise e 222 223 command_args = process.cmdline() 224 is_daemon = command_args[1] == '-c' 225 226 if is_daemon: 227 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 228 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 229 if root_dir is None: 230 root_dir = paths.ROOT_DIR_PATH 231 else: 232 root_dir = pathlib.Path(root_dir) 233 jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name 234 daemon_dir = jobs_dir / daemon_id 235 pid_file = daemon_dir / 'process.pid' 236 237 if pid_file.exists(): 238 with open(pid_file, 'r', encoding='utf-8') as f: 239 daemon_pid = int(f.read()) 240 241 if pid != daemon_pid: 242 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 243 else: 244 raise EnvironmentError(f"Is job '{daemon_id}' running?") 245 246 return Job(daemon_id, executor_keys=executor_keys) 247 248 from meerschaum._internal.arguments._parse_arguments import parse_arguments 249 from meerschaum.utils.daemon import get_new_daemon_name 250 251 mrsm_ix = 0 252 for i, arg in enumerate(command_args): 253 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 254 mrsm_ix = i 255 break 256 257 sysargs = command_args[mrsm_ix+1:] 258 kwargs = parse_arguments(sysargs) 259 name = kwargs.get('name', get_new_daemon_name()) 260 return Job(name, sysargs, executor_keys=executor_keys)
Build a Job from the PID of a running Meerschaum process.
Parameters
- pid (int): The PID of the process.
- executor_keys (Optional[str], default None): The executor keys to assign to the job.
262 def start(self, debug: bool = False) -> SuccessTuple: 263 """ 264 Start the job's daemon. 265 """ 266 if self.executor is not None: 267 if not self.exists(debug=debug): 268 return self.executor.create_job( 269 self.name, 270 self.sysargs, 271 properties=self.daemon.properties, 272 debug=debug, 273 ) 274 return self.executor.start_job(self.name, debug=debug) 275 276 if self.is_running(): 277 return True, f"{self} is already running." 278 279 success, msg = self.daemon.run( 280 keep_daemon_output=(not self.delete_after_completion), 281 allow_dirty_run=True, 282 ) 283 if not success: 284 return success, msg 285 286 return success, f"Started {self}."
Start the job's daemon.
288 def stop( 289 self, 290 timeout_seconds: Union[int, float, None] = None, 291 debug: bool = False, 292 ) -> SuccessTuple: 293 """ 294 Stop the job's daemon. 295 """ 296 if self.executor is not None: 297 return self.executor.stop_job(self.name, debug=debug) 298 299 if self.daemon.status == 'stopped': 300 if not self.restart: 301 return True, f"{self} is not running." 302 elif self.stop_time is not None: 303 return True, f"{self} will not restart until manually started." 304 305 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 306 if quit_success: 307 return quit_success, f"Stopped {self}." 308 309 warn( 310 f"Failed to gracefully quit {self}.", 311 stack=False, 312 ) 313 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 314 if not kill_success: 315 return kill_success, kill_msg 316 317 return kill_success, f"Killed {self}."
Stop the job's daemon.
319 def pause( 320 self, 321 timeout_seconds: Union[int, float, None] = None, 322 debug: bool = False, 323 ) -> SuccessTuple: 324 """ 325 Pause the job's daemon. 326 """ 327 if self.executor is not None: 328 return self.executor.pause_job(self.name, debug=debug) 329 330 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 331 if not pause_success: 332 return pause_success, pause_msg 333 334 return pause_success, f"Paused {self}."
Pause the job's daemon.
336 def delete(self, debug: bool = False) -> SuccessTuple: 337 """ 338 Delete the job and its daemon. 339 """ 340 if self.executor is not None: 341 return self.executor.delete_job(self.name, debug=debug) 342 343 if self.is_running(): 344 stop_success, stop_msg = self.stop() 345 if not stop_success: 346 return stop_success, stop_msg 347 348 cleanup_success, cleanup_msg = self.daemon.cleanup() 349 if not cleanup_success: 350 return cleanup_success, cleanup_msg 351 352 _ = self.daemon._properties.pop('result', None) 353 return cleanup_success, f"Deleted {self}."
Delete the job and its daemon.
355 def is_running(self) -> bool: 356 """ 357 Determine whether the job's daemon is running. 358 """ 359 return self.status == 'running'
Determine whether the job's daemon is running.
361 def exists(self, debug: bool = False) -> bool: 362 """ 363 Determine whether the job exists. 364 """ 365 if self.executor is not None: 366 return self.executor.get_job_exists(self.name, debug=debug) 367 368 return self.daemon.path.exists()
Determine whether the job exists.
370 def get_logs(self) -> Union[str, None]: 371 """ 372 Return the output text of the job's daemon. 373 """ 374 if self.executor is not None: 375 return self.executor.get_logs(self.name) 376 377 return self.daemon.log_text
Return the output text of the job's daemon.
379 def monitor_logs( 380 self, 381 callback_function: Callable[[str], None] = _default_stdout_callback, 382 input_callback_function: Optional[Callable[[], str]] = None, 383 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 384 stop_event: Optional[asyncio.Event] = None, 385 stop_on_exit: bool = False, 386 strip_timestamps: bool = False, 387 accept_input: bool = True, 388 debug: bool = False, 389 _logs_path: Optional[pathlib.Path] = None, 390 _log=None, 391 _stdin_file=None, 392 _wait_if_stopped: bool = True, 393 ): 394 """ 395 Monitor the job's log files and execute a callback on new lines. 396 397 Parameters 398 ---------- 399 callback_function: Callable[[str], None], default partial(print, end='') 400 The callback to execute as new data comes in. 401 Defaults to printing the output directly to `stdout`. 402 403 input_callback_function: Optional[Callable[[], str]], default None 404 If provided, execute this callback when the daemon is blocking on stdin. 405 Defaults to `sys.stdin.readline()`. 406 407 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 408 If provided, execute this callback when the daemon stops. 409 The job's SuccessTuple will be passed to the callback. 410 411 stop_event: Optional[asyncio.Event], default None 412 If provided, stop monitoring when this event is set. 413 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 414 from within `callback_function` to stop monitoring. 415 416 stop_on_exit: bool, default False 417 If `True`, stop monitoring when the job stops. 418 419 strip_timestamps: bool, default False 420 If `True`, remove leading timestamps from lines. 421 422 accept_input: bool, default True 423 If `True`, accept input when the daemon blocks on stdin. 424 """ 425 if self.executor is not None: 426 self.executor.monitor_logs( 427 self.name, 428 callback_function, 429 input_callback_function=input_callback_function, 430 stop_callback_function=stop_callback_function, 431 stop_on_exit=stop_on_exit, 432 accept_input=accept_input, 433 strip_timestamps=strip_timestamps, 434 debug=debug, 435 ) 436 return 437 438 monitor_logs_coroutine = self.monitor_logs_async( 439 callback_function=callback_function, 440 input_callback_function=input_callback_function, 441 stop_callback_function=stop_callback_function, 442 stop_event=stop_event, 443 stop_on_exit=stop_on_exit, 444 strip_timestamps=strip_timestamps, 445 accept_input=accept_input, 446 debug=debug, 447 _logs_path=_logs_path, 448 _log=_log, 449 _stdin_file=_stdin_file, 450 _wait_if_stopped=_wait_if_stopped, 451 ) 452 return asyncio.run(monitor_logs_coroutine)
Monitor the job's log files and execute a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default partial(print, end='')):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
454 async def monitor_logs_async( 455 self, 456 callback_function: Callable[[str], None] = _default_stdout_callback, 457 input_callback_function: Optional[Callable[[], str]] = None, 458 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 459 stop_event: Optional[asyncio.Event] = None, 460 stop_on_exit: bool = False, 461 strip_timestamps: bool = False, 462 accept_input: bool = True, 463 debug: bool = False, 464 _logs_path: Optional[pathlib.Path] = None, 465 _log=None, 466 _stdin_file=None, 467 _wait_if_stopped: bool = True, 468 ): 469 """ 470 Monitor the job's log files and await a callback on new lines. 471 472 Parameters 473 ---------- 474 callback_function: Callable[[str], None], default _default_stdout_callback 475 The callback to execute as new data comes in. 476 Defaults to printing the output directly to `stdout`. 477 478 input_callback_function: Optional[Callable[[], str]], default None 479 If provided, execute this callback when the daemon is blocking on stdin. 480 Defaults to `sys.stdin.readline()`. 481 482 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 483 If provided, execute this callback when the daemon stops. 484 The job's SuccessTuple will be passed to the callback. 485 486 stop_event: Optional[asyncio.Event], default None 487 If provided, stop monitoring when this event is set. 488 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 489 from within `callback_function` to stop monitoring. 490 491 stop_on_exit: bool, default False 492 If `True`, stop monitoring when the job stops. 493 494 strip_timestamps: bool, default False 495 If `True`, remove leading timestamps from lines. 496 497 accept_input: bool, default True 498 If `True`, accept input when the daemon blocks on stdin. 499 """ 500 from meerschaum.utils.prompt import prompt 501 502 def default_input_callback_function(): 503 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 504 if prompt_kwargs: 505 answer = prompt(**prompt_kwargs) 506 return answer + '\n' 507 return sys.stdin.readline() 508 509 if input_callback_function is None: 510 input_callback_function = default_input_callback_function 511 512 if self.executor is not None: 513 await self.executor.monitor_logs_async( 514 self.name, 515 callback_function, 516 input_callback_function=input_callback_function, 517 stop_callback_function=stop_callback_function, 518 stop_on_exit=stop_on_exit, 519 strip_timestamps=strip_timestamps, 520 accept_input=accept_input, 521 debug=debug, 522 ) 523 return 524 525 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 526 527 events = { 528 'user': stop_event, 529 'stopped': asyncio.Event(), 530 'stop_token': asyncio.Event(), 531 'stop_exception': asyncio.Event(), 532 'stopped_timeout': asyncio.Event(), 533 } 534 combined_event = asyncio.Event() 535 emitted_text = False 536 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 537 538 async def check_job_status(): 539 if not stop_on_exit: 540 return 541 542 nonlocal emitted_text 543 544 sleep_time = 0.1 545 while sleep_time < 0.2: 546 if self.status == 'stopped': 547 if not emitted_text and _wait_if_stopped: 548 await asyncio.sleep(sleep_time) 549 sleep_time = round(sleep_time * 1.1, 3) 550 continue 551 552 if stop_callback_function is not None: 553 try: 554 if asyncio.iscoroutinefunction(stop_callback_function): 555 await stop_callback_function(self.result) 556 else: 557 stop_callback_function(self.result) 558 except asyncio.exceptions.CancelledError: 559 break 560 except Exception: 561 warn(traceback.format_exc()) 562 563 if stop_on_exit: 564 events['stopped'].set() 565 566 break 567 await asyncio.sleep(0.1) 568 569 events['stopped_timeout'].set() 570 571 async def check_blocking_on_input(): 572 while True: 573 if not emitted_text or not self.is_blocking_on_stdin(): 574 try: 575 await asyncio.sleep(self.refresh_seconds) 576 except asyncio.exceptions.CancelledError: 577 break 578 continue 579 580 if not self.is_running(): 581 break 582 583 await emit_latest_lines() 584 585 try: 586 print('', end='', flush=True) 587 if asyncio.iscoroutinefunction(input_callback_function): 588 data = await input_callback_function() 589 else: 590 loop = asyncio.get_running_loop() 591 data = await loop.run_in_executor(None, input_callback_function) 592 except KeyboardInterrupt: 593 break 594 # if not data.endswith('\n'): 595 # data += '\n' 596 597 stdin_file.write(data) 598 await asyncio.sleep(self.refresh_seconds) 599 600 async def combine_events(): 601 event_tasks = [ 602 asyncio.create_task(event.wait()) 603 for event in events.values() 604 if event is not None 605 ] 606 if not event_tasks: 607 return 608 609 try: 610 done, pending = await asyncio.wait( 611 event_tasks, 612 return_when=asyncio.FIRST_COMPLETED, 613 ) 614 for task in pending: 615 task.cancel() 616 except asyncio.exceptions.CancelledError: 617 pass 618 finally: 619 combined_event.set() 620 621 check_job_status_task = asyncio.create_task(check_job_status()) 622 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 623 combine_events_task = asyncio.create_task(combine_events()) 624 625 log = _log if _log is not None else self.daemon.rotating_log 626 lines_to_show = ( 627 self.daemon.properties.get( 628 'logs', {} 629 ).get( 630 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 631 ) 632 ) 633 634 async def emit_latest_lines(): 635 nonlocal emitted_text 636 nonlocal stop_event 637 lines = log.readlines() 638 for line in lines[(-1 * lines_to_show):]: 639 if stop_event is not None and stop_event.is_set(): 640 return 641 642 line_stripped_extra = strip_timestamp_from_line(line.strip()) 643 line_stripped = strip_timestamp_from_line(line) 644 645 if line_stripped_extra == STOP_TOKEN: 646 events['stop_token'].set() 647 return 648 649 if line_stripped_extra == CLEAR_TOKEN: 650 clear_screen(debug=debug) 651 continue 652 653 if line_stripped_extra == FLUSH_TOKEN.strip(): 654 line_stripped = '' 655 line = '' 656 657 if strip_timestamps: 658 line = line_stripped 659 660 try: 661 if asyncio.iscoroutinefunction(callback_function): 662 await callback_function(line) 663 else: 664 callback_function(line) 665 emitted_text = True 666 except StopMonitoringLogs: 667 events['stop_exception'].set() 668 return 669 except Exception: 670 warn(f"Error in logs callback:\n{traceback.format_exc()}") 671 672 await emit_latest_lines() 673 674 tasks = ( 675 [check_job_status_task] 676 + ([check_blocking_on_input_task] if accept_input else []) 677 + [combine_events_task] 678 ) 679 try: 680 _ = asyncio.gather(*tasks, return_exceptions=True) 681 except asyncio.exceptions.CancelledError: 682 raise 683 except Exception: 684 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 685 686 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 687 dir_path_to_monitor = ( 688 _logs_path 689 or (log.file_path.parent if log else None) 690 or paths.LOGS_RESOURCES_PATH 691 ) 692 async for changes in watchfiles.awatch( 693 dir_path_to_monitor, 694 stop_event=combined_event, 695 ): 696 for change in changes: 697 file_path_str = change[1] 698 file_path = pathlib.Path(file_path_str) 699 latest_subfile_path = log.get_latest_subfile_path() 700 if latest_subfile_path != file_path: 701 continue 702 703 await emit_latest_lines() 704 705 await emit_latest_lines()
Monitor the job's log files and await a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default _default_stdout_callback):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
707 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 708 """ 709 Return whether a job's daemon is blocking on stdin. 710 """ 711 if self.executor is not None: 712 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 713 714 return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
Return whether a job's daemon is blocking on stdin.
716 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 717 """ 718 Return the kwargs to the blocking `prompt()`, if available. 719 """ 720 if self.executor is not None: 721 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 722 723 if not self.daemon.prompt_kwargs_file_path.exists(): 724 return {} 725 726 try: 727 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 728 prompt_kwargs = json.load(f) 729 730 return prompt_kwargs 731 732 except Exception: 733 import traceback 734 traceback.print_exc() 735 return {}
Return the kwargs to the blocking prompt(), if available.
737 def write_stdin(self, data): 738 """ 739 Write to a job's daemon's `stdin`. 740 """ 741 self.daemon.stdin_file.write(data)
Write to a job's daemon's stdin.
743 @property 744 def executor(self) -> Union[Executor, None]: 745 """ 746 If the job is remote, return the connector to the remote API instance. 747 """ 748 return ( 749 mrsm.get_connector(self.executor_keys) 750 if self.executor_keys != 'local' 751 else None 752 )
If the job is remote, return the connector to the remote API instance.
754 @property 755 def status(self) -> str: 756 """ 757 Return the running status of the job's daemon. 758 """ 759 if '_status_hook' in self.__dict__: 760 return self._status_hook() 761 762 if self.executor is not None: 763 return self.executor.get_job_status(self.name) 764 765 return self.daemon.status
Return the running status of the job's daemon.
767 @property 768 def pid(self) -> Union[int, None]: 769 """ 770 Return the PID of the job's dameon. 771 """ 772 if self.executor is not None: 773 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 774 775 return self.daemon.pid
Return the PID of the job's dameon.
777 @property 778 def restart(self) -> bool: 779 """ 780 Return whether to restart a stopped job. 781 """ 782 if self.executor is not None: 783 return self.executor.get_job_metadata(self.name).get('restart', False) 784 785 return self.daemon.properties.get('restart', False)
Return whether to restart a stopped job.
787 @property 788 def result(self) -> SuccessTuple: 789 """ 790 Return the `SuccessTuple` when the job has terminated. 791 """ 792 if self.is_running(): 793 return True, f"{self} is running." 794 795 if '_result_hook' in self.__dict__: 796 return self._result_hook() 797 798 if self.executor is not None: 799 return ( 800 self.executor.get_job_metadata(self.name) 801 .get('result', (False, "No result available.")) 802 ) 803 804 _result = self.daemon.properties.get('result', None) 805 if _result is None: 806 from meerschaum.utils.daemon.Daemon import _results 807 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 808 809 return tuple(_result)
Return the SuccessTuple when the job has terminated.
811 @property 812 def sysargs(self) -> List[str]: 813 """ 814 Return the sysargs to use for the Daemon. 815 """ 816 if self._sysargs: 817 return self._sysargs 818 819 if self.executor is not None: 820 return self.executor.get_job_metadata(self.name).get('sysargs', []) 821 822 target_args = self.daemon.target_args 823 if target_args is None: 824 return [] 825 self._sysargs = target_args[0] if len(target_args) > 0 else [] 826 return self._sysargs
Return the sysargs to use for the Daemon.
828 def get_daemon_properties(self) -> Dict[str, Any]: 829 """ 830 Return the `properties` dictionary for the job's daemon. 831 """ 832 remote_properties = ( 833 {} 834 if self.executor is None 835 else self.executor.get_job_properties(self.name) 836 ) 837 return { 838 **remote_properties, 839 **self._properties_patch 840 }
Return the properties dictionary for the job's daemon.
842 @property 843 def daemon(self) -> 'Daemon': 844 """ 845 Return the daemon which this job manages. 846 """ 847 from meerschaum.utils.daemon import Daemon 848 if self._daemon is not None and self.executor is None and self._sysargs: 849 return self._daemon 850 851 self._daemon = Daemon( 852 target=entry, 853 target_args=[self._sysargs], 854 target_kw={}, 855 daemon_id=self.name, 856 label=shlex.join(self._sysargs), 857 properties=self.get_daemon_properties(), 858 ) 859 if '_rotating_log' in self.__dict__: 860 self._daemon._rotating_log = self._rotating_log 861 862 if '_stdin_file' in self.__dict__: 863 self._daemon._stdin_file = self._stdin_file 864 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 865 866 return self._daemon
Return the daemon which this job manages.
868 @property 869 def began(self) -> Union[datetime, None]: 870 """ 871 The datetime when the job began running. 872 """ 873 if self.executor is not None: 874 began_str = self.executor.get_job_began(self.name) 875 if began_str is None: 876 return None 877 return ( 878 datetime.fromisoformat(began_str) 879 .astimezone(timezone.utc) 880 .replace(tzinfo=None) 881 ) 882 883 began_str = self.daemon.properties.get('process', {}).get('began', None) 884 if began_str is None: 885 return None 886 887 return datetime.fromisoformat(began_str)
The datetime when the job began running.
889 @property 890 def ended(self) -> Union[datetime, None]: 891 """ 892 The datetime when the job stopped running. 893 """ 894 if self.executor is not None: 895 ended_str = self.executor.get_job_ended(self.name) 896 if ended_str is None: 897 return None 898 return ( 899 datetime.fromisoformat(ended_str) 900 .astimezone(timezone.utc) 901 .replace(tzinfo=None) 902 ) 903 904 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 905 if ended_str is None: 906 return None 907 908 return datetime.fromisoformat(ended_str)
The datetime when the job stopped running.
910 @property 911 def paused(self) -> Union[datetime, None]: 912 """ 913 The datetime when the job was suspended while running. 914 """ 915 if self.executor is not None: 916 paused_str = self.executor.get_job_paused(self.name) 917 if paused_str is None: 918 return None 919 return ( 920 datetime.fromisoformat(paused_str) 921 .astimezone(timezone.utc) 922 .replace(tzinfo=None) 923 ) 924 925 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 926 if paused_str is None: 927 return None 928 929 return datetime.fromisoformat(paused_str)
The datetime when the job was suspended while running.
931 @property 932 def stop_time(self) -> Union[datetime, None]: 933 """ 934 Return the timestamp when the job was manually stopped. 935 """ 936 if self.executor is not None: 937 return self.executor.get_job_stop_time(self.name) 938 939 if not self.daemon.stop_path.exists(): 940 return None 941 942 stop_data = self.daemon._read_stop_file() 943 if not stop_data: 944 return None 945 946 stop_time_str = stop_data.get('stop_time', None) 947 if not stop_time_str: 948 warn(f"Could not read stop time for {self}.") 949 return None 950 951 return datetime.fromisoformat(stop_time_str)
Return the timestamp when the job was manually stopped.
964 def check_restart(self) -> SuccessTuple: 965 """ 966 If `restart` is `True` and the daemon is not running, 967 restart the job. 968 Do not restart if the job was manually stopped. 969 """ 970 if self.is_running(): 971 return True, f"{self} is running." 972 973 if not self.restart: 974 return True, f"{self} does not need to be restarted." 975 976 if self.stop_time is not None: 977 return True, f"{self} was manually stopped." 978 979 return self.start()
If restart is True and the daemon is not running,
restart the job.
Do not restart if the job was manually stopped.
981 @property 982 def label(self) -> str: 983 """ 984 Return the job's Daemon label (joined sysargs). 985 """ 986 from meerschaum._internal.arguments import compress_pipeline_sysargs 987 sysargs = compress_pipeline_sysargs(self.sysargs) 988 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
Return the job's Daemon label (joined sysargs).
1017 @property 1018 def env(self) -> Dict[str, str]: 1019 """ 1020 Return the environment variables to set for the job's process. 1021 """ 1022 if '_env' in self.__dict__: 1023 return self.__dict__['_env'] 1024 1025 _env = self.daemon.properties.get('env', {}) 1026 default_env = { 1027 'PYTHONUNBUFFERED': '1', 1028 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1029 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1030 STATIC_CONFIG['environment']['noninteractive']: 'true', 1031 } 1032 self._env = {**default_env, **_env} 1033 return self._env
Return the environment variables to set for the job's process.
1035 @property 1036 def delete_after_completion(self) -> bool: 1037 """ 1038 Return whether this job is configured to delete itself after completion. 1039 """ 1040 if '_delete_after_completion' in self.__dict__: 1041 return self.__dict__.get('_delete_after_completion', False) 1042 1043 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1044 return self._delete_after_completion
Return whether this job is configured to delete itself after completion.
10def pprint( 11 *args, 12 detect_password: bool = True, 13 nopretty: bool = False, 14 **kw 15) -> None: 16 """Pretty print an object according to the configured ANSI and UNICODE settings. 17 If detect_password is True (default), search and replace passwords with '*' characters. 18 Does not mutate objects. 19 """ 20 import copy 21 import json 22 from meerschaum.utils.packages import attempt_import, import_rich 23 from meerschaum.utils.formatting import ANSI, get_console, print_tuple 24 from meerschaum.utils.warnings import error 25 from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords 26 from collections import OrderedDict 27 28 if ( 29 len(args) == 1 30 and 31 isinstance(args[0], tuple) 32 and 33 len(args[0]) == 2 34 and 35 isinstance(args[0][0], bool) 36 and 37 isinstance(args[0][1], str) 38 ): 39 return print_tuple(args[0], **filter_keywords(print_tuple, **kw)) 40 41 modify = True 42 rich_pprint = None 43 if ANSI and not nopretty: 44 rich = import_rich() 45 if rich is not None: 46 rich_pretty = attempt_import('rich.pretty') 47 if rich_pretty is not None: 48 def _rich_pprint(*args, **kw): 49 _console = get_console() 50 _kw = filter_keywords(_console.print, **kw) 51 _console.print(*args, **_kw) 52 rich_pprint = _rich_pprint 53 elif not nopretty: 54 pprintpp = attempt_import('pprintpp', warn=False) 55 try: 56 _pprint = pprintpp.pprint 57 except Exception : 58 import pprint as _pprint_module 59 _pprint = _pprint_module.pprint 60 61 func = ( 62 _pprint if rich_pprint is None else rich_pprint 63 ) if not nopretty else print 64 65 try: 66 args_copy = copy.deepcopy(args) 67 except Exception: 68 args_copy = args 69 modify = False 70 71 _args = [] 72 for a in args: 73 c = a 74 ### convert OrderedDict into dict 75 if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict): 76 c = dict_from_od(copy.deepcopy(c)) 77 _args.append(c) 78 args = _args 79 80 _args = list(args) 81 if detect_password and modify: 82 _args = [] 83 for a in args: 84 c = a 85 if isinstance(c, dict): 86 c = replace_password(copy.deepcopy(c)) 87 if nopretty: 88 try: 89 c = json.dumps(c) 90 is_json = True 91 except Exception: 92 is_json = False 93 if not is_json: 94 try: 95 c = str(c) 96 except Exception: 97 pass 98 _args.append(c) 99 100 ### filter out unsupported keywords 101 func_kw = filter_keywords(func, **kw) if not nopretty else {} 102 error_msg = None 103 try: 104 func(*_args, **func_kw) 105 except Exception as e: 106 error_msg = e 107 if error_msg is not None: 108 error(error_msg)
Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.
1250def attempt_import( 1251 *names: str, 1252 lazy: bool = True, 1253 warn: bool = True, 1254 install: bool = True, 1255 venv: Optional[str] = 'mrsm', 1256 precheck: bool = True, 1257 split: bool = True, 1258 check_update: bool = False, 1259 check_pypi: bool = False, 1260 check_is_installed: bool = True, 1261 allow_outside_venv: bool = True, 1262 color: bool = True, 1263 debug: bool = False 1264) -> Any: 1265 """ 1266 Raise a warning if packages are not installed; otherwise import and return modules. 1267 If `lazy` is `True`, return lazy-imported modules. 1268 1269 Returns tuple of modules if multiple names are provided, else returns one module. 1270 1271 Parameters 1272 ---------- 1273 names: List[str] 1274 The packages to be imported. 1275 1276 lazy: bool, default True 1277 If `True`, lazily load packages. 1278 1279 warn: bool, default True 1280 If `True`, raise a warning if a package cannot be imported. 1281 1282 install: bool, default True 1283 If `True`, attempt to install a missing package into the designated virtual environment. 1284 If `check_update` is True, install updates if available. 1285 1286 venv: Optional[str], default 'mrsm' 1287 The virtual environment in which to search for packages and to install packages into. 1288 1289 precheck: bool, default True 1290 If `True`, attempt to find module before importing (necessary for checking if modules exist 1291 and retaining lazy imports), otherwise assume lazy is `False`. 1292 1293 split: bool, default True 1294 If `True`, split packages' names on `'.'`. 1295 1296 check_update: bool, default False 1297 If `True` and `install` is `True`, install updates if the required minimum version 1298 does not match. 1299 1300 check_pypi: bool, default False 1301 If `True` and `check_update` is `True`, check PyPI when determining whether 1302 an update is required. 1303 1304 check_is_installed: bool, default True 1305 If `True`, check if the package is contained in the virtual environment. 1306 1307 allow_outside_venv: bool, default True 1308 If `True`, search outside of the specified virtual environment 1309 if the package cannot be found. 1310 Setting to `False` will reinstall the package into a virtual environment, even if it 1311 is installed outside. 1312 1313 color: bool, default True 1314 If `False`, do not print ANSI colors. 1315 1316 Returns 1317 ------- 1318 The specified modules. If they're not available and `install` is `True`, it will first 1319 download them into a virtual environment and return the modules. 1320 1321 Examples 1322 -------- 1323 >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy') 1324 >>> pandas = attempt_import('pandas') 1325 1326 """ 1327 1328 import importlib.util 1329 1330 ### to prevent recursion, check if parent Meerschaum package is being imported 1331 if names == ('meerschaum',): 1332 return _import_module('meerschaum') 1333 1334 if venv == 'mrsm' and _import_hook_venv is not None: 1335 if debug: 1336 print(f"Import hook for virtual environment '{_import_hook_venv}' is active.") 1337 venv = _import_hook_venv 1338 1339 _warnings = _import_module('meerschaum.utils.warnings') 1340 warn_function = _warnings.warn 1341 1342 def do_import(_name: str, **kw) -> Union['ModuleType', None]: 1343 with Venv(venv=venv, debug=debug): 1344 ### determine the import method (lazy vs normal) 1345 from meerschaum.utils.misc import filter_keywords 1346 import_method = ( 1347 _import_module if not lazy 1348 else lazy_import 1349 ) 1350 try: 1351 mod = import_method(_name, **(filter_keywords(import_method, **kw))) 1352 except Exception as e: 1353 if warn: 1354 import traceback 1355 traceback.print_exception(type(e), e, e.__traceback__) 1356 warn_function( 1357 f"Failed to import module '{_name}'.\nException:\n{e}", 1358 ImportWarning, 1359 stacklevel = (5 if lazy else 4), 1360 color = False, 1361 ) 1362 mod = None 1363 return mod 1364 1365 modules = [] 1366 for name in names: 1367 ### Check if package is a declared dependency. 1368 root_name = name.split('.')[0] if split else name 1369 install_name = _import_to_install_name(root_name) 1370 1371 if install_name is None: 1372 install_name = root_name 1373 if warn and root_name != 'plugins': 1374 warn_function( 1375 f"Package '{root_name}' is not declared in meerschaum.utils.packages.", 1376 ImportWarning, 1377 stacklevel = 3, 1378 color = False 1379 ) 1380 1381 ### Determine if the package exists. 1382 if precheck is False: 1383 found_module = ( 1384 do_import( 1385 name, debug=debug, warn=False, venv=venv, color=color, 1386 check_update=False, check_pypi=False, split=split, 1387 ) is not None 1388 ) 1389 else: 1390 if check_is_installed: 1391 with _locks['_is_installed_first_check']: 1392 if not _is_installed_first_check.get(name, False): 1393 package_is_installed = is_installed( 1394 name, 1395 venv = venv, 1396 split = split, 1397 allow_outside_venv = allow_outside_venv, 1398 debug = debug, 1399 ) 1400 _is_installed_first_check[name] = package_is_installed 1401 else: 1402 package_is_installed = _is_installed_first_check[name] 1403 else: 1404 package_is_installed = _is_installed_first_check.get( 1405 name, 1406 venv_contains_package(name, venv=venv, split=split, debug=debug) 1407 ) 1408 found_module = package_is_installed 1409 1410 if not found_module: 1411 if install: 1412 if not pip_install( 1413 install_name, 1414 venv = venv, 1415 split = False, 1416 check_update = check_update, 1417 color = color, 1418 debug = debug 1419 ) and warn: 1420 warn_function( 1421 f"Failed to install '{install_name}'.", 1422 ImportWarning, 1423 stacklevel = 3, 1424 color = False, 1425 ) 1426 elif warn: 1427 ### Raise a warning if we can't find the package and install = False. 1428 warn_function( 1429 (f"\n\nMissing package '{name}' from virtual environment '{venv}'; " 1430 + "some features will not work correctly." 1431 + "\n\nSet install=True when calling attempt_import.\n"), 1432 ImportWarning, 1433 stacklevel = 3, 1434 color = False, 1435 ) 1436 1437 ### Do the import. Will be lazy if lazy=True. 1438 m = do_import( 1439 name, debug=debug, warn=warn, venv=venv, color=color, 1440 check_update=check_update, check_pypi=check_pypi, install=install, split=split, 1441 ) 1442 modules.append(m) 1443 1444 modules = tuple(modules) 1445 if len(modules) == 1: 1446 return modules[0] 1447 return modules
Raise a warning if packages are not installed; otherwise import and return modules.
If lazy is True, return lazy-imported modules.
Returns tuple of modules if multiple names are provided, else returns one module.
Parameters
- names (List[str]): The packages to be imported.
- lazy (bool, default True):
If
True, lazily load packages. - warn (bool, default True):
If
True, raise a warning if a package cannot be imported. - install (bool, default True):
If
True, attempt to install a missing package into the designated virtual environment. Ifcheck_updateis True, install updates if available. - venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
- precheck (bool, default True):
If
True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy isFalse. - split (bool, default True):
If
True, split packages' names on'.'. - check_update (bool, default False):
If
TrueandinstallisTrue, install updates if the required minimum version does not match. - check_pypi (bool, default False):
If
Trueandcheck_updateisTrue, check PyPI when determining whether an update is required. - check_is_installed (bool, default True):
If
True, check if the package is contained in the virtual environment. - allow_outside_venv (bool, default True):
If
True, search outside of the specified virtual environment if the package cannot be found. Setting toFalsewill reinstall the package into a virtual environment, even if it is installed outside. - color (bool, default True):
If
False, do not print ANSI colors.
Returns
- The specified modules. If they're not available and
installisTrue, it will first - download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
22class Connector(metaclass=abc.ABCMeta): 23 """ 24 The base connector class to hold connection attributes. 25 """ 26 27 IS_INSTANCE: bool = False 28 29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 ) 69 70 def _reset_attributes(self): 71 self.__dict__ = self._original_dict 72 73 def _set_attributes( 74 self, 75 *args, 76 inherit_default: bool = True, 77 **kw: Any 78 ): 79 from meerschaum._internal.static import STATIC_CONFIG 80 from meerschaum.utils.warnings import error 81 82 self._attributes = {} 83 84 default_label = STATIC_CONFIG['connectors']['default_label'] 85 86 ### NOTE: Support the legacy method of explicitly passing the type. 87 label = kw.get('label', None) 88 if label is None: 89 if len(args) == 2: 90 label = args[1] 91 elif len(args) == 0: 92 label = None 93 else: 94 label = args[0] 95 96 if label == 'default': 97 error( 98 f"Label cannot be 'default'. Did you mean '{default_label}'?", 99 InvalidAttributesError, 100 ) 101 self.__dict__['label'] = label 102 103 from meerschaum.config import get_config 104 conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors')) 105 connector_config = copy.deepcopy(get_config('system', 'connectors')) 106 107 ### inherit attributes from 'default' if exists 108 if inherit_default: 109 inherit_from = 'default' 110 if self.type in conn_configs and inherit_from in conn_configs[self.type]: 111 _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from]) 112 self._attributes.update(_inherit_dict) 113 114 ### load user config into self._attributes 115 if self.type in conn_configs and self.label in conn_configs[self.type]: 116 self._attributes.update(conn_configs[self.type][self.label] or {}) 117 118 ### load system config into self._sys_config 119 ### (deep copy so future Connectors don't inherit changes) 120 if self.type in connector_config: 121 self._sys_config = copy.deepcopy(connector_config[self.type]) 122 123 ### add additional arguments or override configuration 124 self._attributes.update(kw) 125 126 ### finally, update __dict__ with _attributes. 127 self.__dict__.update(self._attributes) 128 129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 ) 175 176 177 def __str__(self): 178 """ 179 When cast to a string, return type:label. 180 """ 181 return f"{self.type}:{self.label}" 182 183 def __repr__(self): 184 """ 185 Represent the connector as type:label. 186 """ 187 return str(self) 188 189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta 204 205 206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type 225 226 227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
The base connector class to hold connection attributes.
29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 )
129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 )
Ensure that the required attributes have been met.
The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.
Parameters
- required_attributes (Optional[List[str]], default None):
Attributes to be verified. If
None, default to['label']. - debug (bool, default False): Verbosity toggle.
Returns
- Don't return anything.
Raises
- An error if any of the required attributes are missing.
189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta
Return the keys needed to reconstruct this Connector.
206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type
Return the type for this connector.
227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
Return the label for this connector.
18class InstanceConnector(Connector): 19 """ 20 Instance connectors define the interface for managing pipes and provide methods 21 for management of users, plugins, tokens, and other metadata built atop pipes. 22 """ 23 24 IS_INSTANCE: bool = True 25 IS_THREAD_SAFE: bool = False 26 27 from ._users import ( 28 get_users_pipe, 29 register_user, 30 get_user_id, 31 get_username, 32 get_users, 33 edit_user, 34 delete_user, 35 get_user_password_hash, 36 get_user_type, 37 get_user_attributes, 38 ) 39 40 from ._plugins import ( 41 get_plugins_pipe, 42 register_plugin, 43 get_plugin_user_id, 44 delete_plugin, 45 get_plugin_id, 46 get_plugin_version, 47 get_plugins, 48 get_plugin_user_id, 49 get_plugin_username, 50 get_plugin_attributes, 51 ) 52 53 from ._tokens import ( 54 get_tokens_pipe, 55 register_token, 56 edit_token, 57 invalidate_token, 58 delete_token, 59 get_token, 60 get_tokens, 61 get_token_model, 62 get_token_secret_hash, 63 token_exists, 64 get_token_scopes, 65 ) 66 67 from ._pipes import ( 68 register_pipe, 69 get_pipe_attributes, 70 get_pipe_id, 71 edit_pipe, 72 delete_pipe, 73 fetch_pipes_keys, 74 pipe_exists, 75 drop_pipe, 76 drop_pipe_indices, 77 sync_pipe, 78 create_pipe_indices, 79 clear_pipe, 80 get_pipe_data, 81 get_pipe_docs, 82 get_sync_time, 83 get_pipe_columns_types, 84 get_pipe_columns_indices, 85 )
Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.
18def get_users_pipe(self) -> 'mrsm.Pipe': 19 """ 20 Return the pipe used for users registration. 21 """ 22 if '_users_pipe' in self.__dict__: 23 return self._users_pipe 24 25 cache_connector = self.__dict__.get('_cache_connector', None) 26 self._users_pipe = mrsm.Pipe( 27 'mrsm', 'users', 28 instance=self, 29 target='mrsm_users', 30 temporary=True, 31 cache=True, 32 cache_connector_keys=cache_connector, 33 static=True, 34 null_indices=False, 35 columns={ 36 'primary': 'user_id', 37 }, 38 dtypes={ 39 'user_id': 'uuid', 40 'username': 'string', 41 'password_hash': 'string', 42 'email': 'string', 43 'user_type': 'string', 44 'attributes': 'json', 45 }, 46 indices={ 47 'unique': 'username', 48 }, 49 ) 50 return self._users_pipe
Return the pipe used for users registration.
53def register_user( 54 self, 55 user: User, 56 debug: bool = False, 57 **kwargs: Any 58) -> mrsm.SuccessTuple: 59 """ 60 Register a new user to the users pipe. 61 """ 62 users_pipe = self.get_users_pipe() 63 user.user_id = uuid.uuid4() 64 sync_success, sync_msg = users_pipe.sync( 65 [{ 66 'user_id': user.user_id, 67 'username': user.username, 68 'email': user.email, 69 'password_hash': user.password_hash, 70 'user_type': user.type, 71 'attributes': user.attributes, 72 }], 73 check_existing=False, 74 debug=debug, 75 ) 76 if not sync_success: 77 return False, f"Failed to register user '{user.username}':\n{sync_msg}" 78 79 return True, "Success"
Register a new user to the users pipe.
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 83 """ 84 Return a user's ID from the username. 85 """ 86 users_pipe = self.get_users_pipe() 87 result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1) 88 if result_df is None or len(result_df) == 0: 89 return None 90 return result_df['user_id'][0]
Return a user's ID from the username.
93def get_username(self, user_id: Any, debug: bool = False) -> Any: 94 """ 95 Return the username from the given ID. 96 """ 97 users_pipe = self.get_users_pipe() 98 return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)
Return the username from the given ID.
101def get_users( 102 self, 103 debug: bool = False, 104 **kw: Any 105) -> List[str]: 106 """ 107 Get the registered usernames. 108 """ 109 users_pipe = self.get_users_pipe() 110 df = users_pipe.get_data() 111 if df is None: 112 return [] 113 114 return list(df['username'])
Get the registered usernames.
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 118 """ 119 Edit the attributes for an existing user. 120 """ 121 users_pipe = self.get_users_pipe() 122 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 123 124 doc = {'user_id': user_id} 125 if user.email != '': 126 doc['email'] = user.email 127 if user.password_hash != '': 128 doc['password_hash'] = user.password_hash 129 if user.type != '': 130 doc['user_type'] = user.type 131 if user.attributes: 132 doc['attributes'] = user.attributes 133 134 sync_success, sync_msg = users_pipe.sync([doc], debug=debug) 135 if not sync_success: 136 return False, f"Failed to edit user '{user.username}':\n{sync_msg}" 137 138 return True, "Success"
Edit the attributes for an existing user.
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 142 """ 143 Delete a user from the users table. 144 """ 145 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 146 users_pipe = self.get_users_pipe() 147 clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug) 148 if not clear_success: 149 return False, f"Failed to delete user '{user}':\n{clear_msg}" 150 return True, "Success"
Delete a user from the users table.
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 154 """ 155 Get a user's password hash from the users table. 156 """ 157 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 158 users_pipe = self.get_users_pipe() 159 result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug) 160 if result_df is None or len(result_df) == 0: 161 return None 162 163 return result_df['password_hash'][0]
Get a user's password hash from the users table.
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]: 167 """ 168 Get a user's type from the users table. 169 """ 170 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 171 users_pipe = self.get_users_pipe() 172 result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug) 173 if result_df is None or len(result_df) == 0: 174 return None 175 176 return result_df['user_type'][0]
Get a user's type from the users table.
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]: 180 """ 181 Get a user's attributes from the users table. 182 """ 183 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 184 users_pipe = self.get_users_pipe() 185 result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug) 186 if result_df is None or len(result_df) == 0: 187 return None 188 189 return result_df['attributes'][0]
Get a user's attributes from the users table.
16def get_plugins_pipe(self) -> 'mrsm.Pipe': 17 """ 18 Return the internal pipe for syncing plugins metadata. 19 """ 20 if '_plugins_pipe' in self.__dict__: 21 return self._plugins_pipe 22 23 cache_connector = self.__dict__.get('_cache_connector', None) 24 users_pipe = self.get_users_pipe() 25 user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid') 26 27 self._plugins_pipe = mrsm.Pipe( 28 'mrsm', 'plugins', 29 instance=self, 30 target='mrsm_plugins', 31 temporary=True, 32 cache=True, 33 cache_connector_keys=cache_connector, 34 static=True, 35 null_indices=False, 36 columns={ 37 'primary': 'plugin_name', 38 'user_id': 'user_id', 39 }, 40 dtypes={ 41 'plugin_name': 'string', 42 'user_id': user_id_dtype, 43 'attributes': 'json', 44 'version': 'string', 45 }, 46 ) 47 return self._plugins_pipe
Return the internal pipe for syncing plugins metadata.
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 51 """ 52 Register a new plugin to the plugins table. 53 """ 54 plugins_pipe = self.get_plugins_pipe() 55 users_pipe = self.get_users_pipe() 56 user_id = self.get_plugin_user_id(plugin) 57 if user_id is not None: 58 username = self.get_username(user_id, debug=debug) 59 return False, f"{plugin} is already registered to '{username}'." 60 61 doc = { 62 'plugin_name': plugin.name, 63 'version': plugin.version, 64 'attributes': plugin.attributes, 65 'user_id': plugin.user_id, 66 } 67 68 sync_success, sync_msg = plugins_pipe.sync( 69 [doc], 70 check_existing=False, 71 debug=debug, 72 ) 73 if not sync_success: 74 return False, f"Failed to register {plugin}:\n{sync_msg}" 75 76 return True, "Success"
Register a new plugin to the plugins table.
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 80 """ 81 Return the user ID for plugin's owner. 82 """ 83 plugins_pipe = self.get_plugins_pipe() 84 return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)
Return the user ID for plugin's owner.
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 106 """ 107 Delete a plugin's registration. 108 """ 109 plugin_id = self.get_plugin_id(plugin, debug=debug) 110 if plugin_id is None: 111 return False, f"{plugin} is not registered." 112 113 plugins_pipe = self.get_plugins_pipe() 114 clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug) 115 if not clear_success: 116 return False, f"Failed to delete {plugin}:\n{clear_msg}" 117 return True, "Success"
Delete a plugin's registration.
97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 98 """ 99 Return a plugin's ID. 100 """ 101 user_id = self.get_plugin_user_id(plugin, debug=debug) 102 return plugin.name if user_id is not None else None
Return a plugin's ID.
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 121 """ 122 Return the version for a plugin. 123 """ 124 plugins_pipe = self.get_plugins_pipe() 125 return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)
Return the version for a plugin.
136def get_plugins( 137 self, 138 user_id: Optional[int] = None, 139 search_term: Optional[str] = None, 140 debug: bool = False, 141 **kw: Any 142) -> List[str]: 143 """ 144 Return a list of plugin names. 145 """ 146 plugins_pipe = self.get_plugins_pipe() 147 params = {} 148 if user_id: 149 params['user_id'] = user_id 150 151 df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug) 152 if df is None: 153 return [] 154 155 docs = df.to_dict(orient='records') 156 return [ 157 plugin_name 158 for doc in docs 159 if (plugin_name := doc['plugin_name']).startswith(search_term or '') 160 ]
Return a list of plugin names.
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 88 """ 89 Return the username for plugin's owner. 90 """ 91 user_id = self.get_plugin_user_id(plugin, debug=debug) 92 if user_id is None: 93 return None 94 return self.get_username(user_id, debug=debug)
Return the username for plugin's owner.
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]: 129 """ 130 Return the attributes for a plugin. 131 """ 132 plugins_pipe = self.get_plugins_pipe() 133 return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}
Return the attributes for a plugin.
22def get_tokens_pipe(self) -> mrsm.Pipe: 23 """ 24 Return the internal pipe for tokens management. 25 """ 26 if '_tokens_pipe' in self.__dict__: 27 return self._tokens_pipe 28 29 users_pipe = self.get_users_pipe() 30 user_id_dtype = ( 31 users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid') 32 ) 33 34 cache_connector = self.__dict__.get('_cache_connector', None) 35 36 self._tokens_pipe = mrsm.Pipe( 37 'mrsm', 'tokens', 38 instance=self, 39 target='mrsm_tokens', 40 temporary=True, 41 cache=True, 42 cache_connector_keys=cache_connector, 43 static=True, 44 autotime=True, 45 null_indices=False, 46 columns={ 47 'datetime': 'creation', 48 'primary': 'id', 49 }, 50 indices={ 51 'unique': 'label', 52 'user_id': 'user_id', 53 }, 54 dtypes={ 55 'id': 'uuid', 56 'creation': 'datetime', 57 'expiration': 'datetime', 58 'is_valid': 'bool', 59 'label': 'string', 60 'user_id': user_id_dtype, 61 'scopes': 'json', 62 'secret_hash': 'string', 63 }, 64 ) 65 return self._tokens_pipe
Return the internal pipe for tokens management.
68def register_token( 69 self, 70 token: Token, 71 debug: bool = False, 72) -> mrsm.SuccessTuple: 73 """ 74 Register the new token to the tokens table. 75 """ 76 token_id, token_secret = token.generate_credentials() 77 tokens_pipe = self.get_tokens_pipe() 78 user_id = self.get_user_id(token.user) if token.user is not None else None 79 if user_id is None: 80 return False, "Cannot register a token without a user." 81 82 doc = { 83 'id': token_id, 84 'user_id': user_id, 85 'creation': datetime.now(timezone.utc), 86 'expiration': token.expiration, 87 'label': token.label, 88 'is_valid': token.is_valid, 89 'scopes': list(token.scopes) if token.scopes else [], 90 'secret_hash': hash_password( 91 str(token_secret), 92 rounds=STATIC_CONFIG['tokens']['hash_rounds'] 93 ), 94 } 95 sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug) 96 if not sync_success: 97 return False, f"Failed to register token:\n{sync_msg}" 98 return True, "Success"
Register the new token to the tokens table.
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 102 """ 103 Persist the token's in-memory state to the tokens pipe. 104 """ 105 if not token.id: 106 return False, "Token ID is not set." 107 108 if not token.exists(debug=debug): 109 return False, f"Token {token.id} does not exist." 110 111 if not token.creation: 112 token_model = self.get_token_model(token.id) 113 token.creation = token_model.creation 114 115 tokens_pipe = self.get_tokens_pipe() 116 doc = { 117 'id': token.id, 118 'creation': token.creation, 119 'expiration': token.expiration, 120 'label': token.label, 121 'is_valid': token.is_valid, 122 'scopes': list(token.scopes) if token.scopes else [], 123 } 124 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 125 if not sync_success: 126 return False, f"Failed to edit token '{token.id}':\n{sync_msg}" 127 128 return True, "Success"
Persist the token's in-memory state to the tokens pipe.
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 132 """ 133 Set `is_valid` to `False` for the given token. 134 """ 135 if not token.id: 136 return False, "Token ID is not set." 137 138 if not token.exists(debug=debug): 139 return False, f"Token {token.id} does not exist." 140 141 if not token.creation: 142 token_model = self.get_token_model(token.id) 143 token.creation = token_model.creation 144 145 token.is_valid = False 146 tokens_pipe = self.get_tokens_pipe() 147 doc = { 148 'id': token.id, 149 'creation': token.creation, 150 'is_valid': False, 151 } 152 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 153 if not sync_success: 154 return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}" 155 156 return True, "Success"
Set is_valid to False for the given token.
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 160 """ 161 Delete the given token from the tokens table. 162 """ 163 if not token.id: 164 return False, "Token ID is not set." 165 166 if not token.exists(debug=debug): 167 return False, f"Token {token.id} does not exist." 168 169 if not token.creation: 170 token_model = self.get_token_model(token.id) 171 token.creation = token_model.creation 172 173 token.is_valid = False 174 tokens_pipe = self.get_tokens_pipe() 175 clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug) 176 if not clear_success: 177 return False, f"Failed to delete token '{token.id}':\n{clear_msg}" 178 179 return True, "Success"
Delete the given token from the tokens table.
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]: 236 """ 237 Return the `Token` from its ID. 238 """ 239 from meerschaum.utils.misc import is_uuid 240 if isinstance(token_id, str): 241 if is_uuid(token_id): 242 token_id = uuid.UUID(token_id) 243 else: 244 raise ValueError("Invalid token ID.") 245 token_model = self.get_token_model(token_id) 246 if token_model is None: 247 return None 248 return Token(**dict(token_model))
Return the Token from its ID.
182def get_tokens( 183 self, 184 user: Optional[User] = None, 185 labels: Optional[List[str]] = None, 186 ids: Optional[List[uuid.UUID]] = None, 187 debug: bool = False, 188) -> List[Token]: 189 """ 190 Return a list of `Token` objects. 191 """ 192 tokens_pipe = self.get_tokens_pipe() 193 user_id = ( 194 self.get_user_id(user, debug=debug) 195 if user is not None 196 else None 197 ) 198 user_type = self.get_user_type(user, debug=debug) if user is not None else None 199 params = ( 200 { 201 'user_id': ( 202 user_id 203 if user_type != 'admin' 204 else [user_id, None] 205 ) 206 } 207 if user_id is not None 208 else {} 209 ) 210 if labels: 211 params['label'] = labels 212 if ids: 213 params['id'] = ids 214 215 if debug: 216 dprint(f"Getting tokens with {user_id=}, {params=}") 217 218 tokens_df = tokens_pipe.get_data(params=params, debug=debug) 219 if tokens_df is None: 220 return [] 221 222 if debug: 223 dprint(f"Retrieved tokens dataframe:\n{tokens_df}") 224 225 tokens_docs = tokens_df.to_dict(orient='records') 226 return [ 227 Token( 228 instance=self, 229 **token_doc 230 ) 231 for token_doc in reversed(tokens_docs) 232 ]
Return a list of Token objects.
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]': 252 """ 253 Return a token's model from the instance. 254 """ 255 from meerschaum.models import TokenModel 256 if isinstance(token_id, Token): 257 token_id = Token.id 258 if not token_id: 259 raise ValueError("Invalid token ID.") 260 tokens_pipe = self.get_tokens_pipe() 261 doc = tokens_pipe.get_doc( 262 params={'id': token_id}, 263 debug=debug, 264 ) 265 if doc is None: 266 return None 267 return TokenModel(**doc)
Return a token's model from the instance.
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]: 271 """ 272 Return the secret hash for a given token. 273 """ 274 if isinstance(token_id, Token): 275 token_id = token_id.id 276 if not token_id: 277 raise ValueError("Invalid token ID.") 278 tokens_pipe = self.get_tokens_pipe() 279 return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)
Return the secret hash for a given token.
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool: 309 """ 310 Return `True` if a token exists in the tokens pipe. 311 """ 312 if isinstance(token_id, Token): 313 token_id = token_id.id 314 if not token_id: 315 raise ValueError("Invalid token ID.") 316 317 tokens_pipe = self.get_tokens_pipe() 318 return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None
Return True if a token exists in the tokens pipe.
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]: 296 """ 297 Return the scopes for a token. 298 """ 299 if isinstance(token_id, Token): 300 token_id = token_id.id 301 if not token_id: 302 raise ValueError("Invalid token ID.") 303 304 tokens_pipe = self.get_tokens_pipe() 305 return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []
Return the scopes for a token.
17@abc.abstractmethod 18def register_pipe( 19 self, 20 pipe: mrsm.Pipe, 21 debug: bool = False, 22 **kwargs: Any 23) -> mrsm.SuccessTuple: 24 """ 25 Insert the pipe's attributes into the internal `pipes` table. 26 27 Parameters 28 ---------- 29 pipe: mrsm.Pipe 30 The pipe to be registered. 31 32 Returns 33 ------- 34 A `SuccessTuple` of the result. 35 """
Insert the pipe's attributes into the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe to be registered.
Returns
- A
SuccessTupleof the result.
37@abc.abstractmethod 38def get_pipe_attributes( 39 self, 40 pipe: mrsm.Pipe, 41 debug: bool = False, 42 **kwargs: Any 43) -> Dict[str, Any]: 44 """ 45 Return the pipe's document from the internal `pipes` table. 46 47 Parameters 48 ---------- 49 pipe: mrsm.Pipe 50 The pipe whose attributes should be retrieved. 51 52 Returns 53 ------- 54 The document that matches the keys of the pipe. 55 """
Return the pipe's document from the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
- The document that matches the keys of the pipe.
57@abc.abstractmethod 58def get_pipe_id( 59 self, 60 pipe: mrsm.Pipe, 61 debug: bool = False, 62 **kwargs: Any 63) -> Union[str, int, None]: 64 """ 65 Return the `id` for the pipe if it exists. 66 67 Parameters 68 ---------- 69 pipe: mrsm.Pipe 70 The pipe whose `id` to fetch. 71 72 Returns 73 ------- 74 The `id` for the pipe's document or `None`. 75 """
Return the id for the pipe if it exists.
Parameters
- pipe (mrsm.Pipe):
The pipe whose
idto fetch.
Returns
- The
idfor the pipe's document orNone.
77def edit_pipe( 78 self, 79 pipe: mrsm.Pipe, 80 debug: bool = False, 81 **kwargs: Any 82) -> mrsm.SuccessTuple: 83 """ 84 Edit the attributes of the pipe. 85 86 Parameters 87 ---------- 88 pipe: mrsm.Pipe 89 The pipe whose in-memory parameters must be persisted. 90 91 Returns 92 ------- 93 A `SuccessTuple` indicating success. 94 """ 95 raise NotImplementedError
Edit the attributes of the pipe.
Parameters
- pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
- A
SuccessTupleindicating success.
97def delete_pipe( 98 self, 99 pipe: mrsm.Pipe, 100 debug: bool = False, 101 **kwargs: Any 102) -> mrsm.SuccessTuple: 103 """ 104 Delete a pipe's registration from the `pipes` collection. 105 106 Parameters 107 ---------- 108 pipe: mrsm.Pipe 109 The pipe to be deleted. 110 111 Returns 112 ------- 113 A `SuccessTuple` indicating success. 114 """ 115 raise NotImplementedError
Delete a pipe's registration from the pipes collection.
Parameters
- pipe (mrsm.Pipe): The pipe to be deleted.
Returns
- A
SuccessTupleindicating success.
117@abc.abstractmethod 118def fetch_pipes_keys( 119 self, 120 connector_keys: Optional[List[str]] = None, 121 metric_keys: Optional[List[str]] = None, 122 location_keys: Optional[List[str]] = None, 123 tags: Optional[List[str]] = None, 124 debug: bool = False, 125 **kwargs: Any 126) -> Union[ 127 List[Tuple[str, str, str]], 128 List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], 129 Dict[Union[int, str], Tuple[str, str, str]], 130 Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], 131]: 132 """ 133 Return registered pipes' keys according to the provided filters. 134 135 May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples. 136 When returning a dictionary, the key is the pipe's unique ID (int or str). 137 Tuples may be length 3 `(connector_keys, metric_key, location_key)` or length 4 138 with parameters or tags appended as the fourth element. 139 140 Parameters 141 ---------- 142 connector_keys: list[str] | None, default None 143 The keys passed via `-c`. 144 145 metric_keys: list[str] | None, default None 146 The keys passed via `-m`. 147 148 location_keys: list[str] | None, default None 149 The keys passed via `-l`. 150 151 tags: List[str] | None, default None 152 Tags passed via `--tags` which are stored under `parameters:tags`. 153 154 Returns 155 ------- 156 A list of tuples or a dictionary mapping pipe IDs to tuples. 157 You may return the string `"None"` for location keys in place of nulls. 158 159 Examples 160 -------- 161 >>> import meerschaum as mrsm 162 >>> conn = mrsm.get_connector('example:demo') 163 >>> 164 >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn) 165 >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn) 166 >>> pipe_a.register() 167 >>> pipe_b.register() 168 >>> 169 >>> conn.fetch_pipes_keys(['a', 'b']) 170 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 171 >>> conn.fetch_pipes_keys(metric_keys=['demo']) 172 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 173 >>> conn.fetch_pipes_keys(tags=['foo']) 174 [('a', 'demo', 'None')] 175 >>> conn.fetch_pipes_keys(location_keys=[None]) 176 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 177 """
Return registered pipes' keys according to the provided filters.
May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples.
When returning a dictionary, the key is the pipe's unique ID (int or str).
Tuples may be length 3 (connector_keys, metric_key, location_key) or length 4
with parameters or tags appended as the fourth element.
Parameters
- connector_keys (list[str] | None, default None):
The keys passed via
-c. - metric_keys (list[str] | None, default None):
The keys passed via
-m. - location_keys (list[str] | None, default None):
The keys passed via
-l. - tags (List[str] | None, default None):
Tags passed via
--tagswhich are stored underparameters:tags.
Returns
- A list of tuples or a dictionary mapping pipe IDs to tuples.
- You may return the string
"None"for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
179@abc.abstractmethod 180def pipe_exists( 181 self, 182 pipe: mrsm.Pipe, 183 debug: bool = False, 184 **kwargs: Any 185) -> bool: 186 """ 187 Check whether a pipe's target table exists. 188 189 Parameters 190 ---------- 191 pipe: mrsm.Pipe 192 The pipe to check whether its table exists. 193 194 Returns 195 ------- 196 A `bool` indicating the table exists. 197 """
Check whether a pipe's target table exists.
Parameters
- pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
- A
boolindicating the table exists.
199@abc.abstractmethod 200def drop_pipe( 201 self, 202 pipe: mrsm.Pipe, 203 debug: bool = False, 204 **kwargs: Any 205) -> mrsm.SuccessTuple: 206 """ 207 Drop a pipe's collection if it exists. 208 209 Parameters 210 ---------- 211 pipe: mrsm.Pipe 212 The pipe to be dropped. 213 214 Returns 215 ------- 216 A `SuccessTuple` indicating success. 217 """ 218 raise NotImplementedError
Drop a pipe's collection if it exists.
Parameters
- pipe (mrsm.Pipe): The pipe to be dropped.
Returns
- A
SuccessTupleindicating success.
220def drop_pipe_indices( 221 self, 222 pipe: mrsm.Pipe, 223 debug: bool = False, 224 **kwargs: Any 225) -> mrsm.SuccessTuple: 226 """ 227 Drop a pipe's indices. 228 229 Parameters 230 ---------- 231 pipe: mrsm.Pipe 232 The pipe whose indices need to be dropped. 233 234 Returns 235 ------- 236 A `SuccessTuple` indicating success. 237 """ 238 return False, f"Cannot drop indices for instance connectors of type '{self.type}'."
Drop a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
- A
SuccessTupleindicating success.
240@abc.abstractmethod 241def sync_pipe( 242 self, 243 pipe: mrsm.Pipe, 244 df: 'pd.DataFrame' = None, 245 begin: Union[datetime, int, None] = None, 246 end: Union[datetime, int, None] = None, 247 chunksize: Optional[int] = -1, 248 check_existing: bool = True, 249 debug: bool = False, 250 **kwargs: Any 251) -> mrsm.SuccessTuple: 252 """ 253 Sync a pipe using a database connection. 254 255 Parameters 256 ---------- 257 pipe: mrsm.Pipe 258 The Meerschaum Pipe instance into which to sync the data. 259 260 df: Optional[pd.DataFrame] 261 An optional DataFrame or equivalent to sync into the pipe. 262 Defaults to `None`. 263 264 begin: Union[datetime, int, None], default None 265 Optionally specify the earliest datetime to search for data. 266 Defaults to `None`. 267 268 end: Union[datetime, int, None], default None 269 Optionally specify the latest datetime to search for data. 270 Defaults to `None`. 271 272 chunksize: Optional[int], default -1 273 Specify the number of rows to sync per chunk. 274 If `-1`, resort to system configuration (default is `900`). 275 A `chunksize` of `None` will sync all rows in one transaction. 276 Defaults to `-1`. 277 278 check_existing: bool, default True 279 If `True`, pull and diff with existing data from the pipe. Defaults to `True`. 280 281 debug: bool, default False 282 Verbosity toggle. Defaults to False. 283 284 Returns 285 ------- 286 A `SuccessTuple` of success (`bool`) and message (`str`). 287 """
Sync a pipe using a database connection.
Parameters
- pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
- df (Optional[pd.DataFrame]):
An optional DataFrame or equivalent to sync into the pipe.
Defaults to
None. - begin (Union[datetime, int, None], default None):
Optionally specify the earliest datetime to search for data.
Defaults to
None. - end (Union[datetime, int, None], default None):
Optionally specify the latest datetime to search for data.
Defaults to
None. - chunksize (Optional[int], default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. Defaults to-1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. Defaults toTrue. - debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
289def create_pipe_indices( 290 self, 291 pipe: mrsm.Pipe, 292 debug: bool = False, 293 **kwargs: Any 294) -> mrsm.SuccessTuple: 295 """ 296 Create a pipe's indices. 297 298 Parameters 299 ---------- 300 pipe: mrsm.Pipe 301 The pipe whose indices need to be created. 302 303 Returns 304 ------- 305 A `SuccessTuple` indicating success. 306 """ 307 return False, f"Cannot create indices for instance connectors of type '{self.type}'."
Create a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
- A
SuccessTupleindicating success.
309def clear_pipe( 310 self, 311 pipe: mrsm.Pipe, 312 begin: Union[datetime, int, None] = None, 313 end: Union[datetime, int, None] = None, 314 params: Optional[Dict[str, Any]] = None, 315 debug: bool = False, 316 **kwargs: Any 317) -> mrsm.SuccessTuple: 318 """ 319 Delete rows within `begin`, `end`, and `params`. 320 321 Parameters 322 ---------- 323 pipe: mrsm.Pipe 324 The pipe whose rows to clear. 325 326 begin: datetime | int | None, default None 327 If provided, remove rows >= `begin`. 328 329 end: datetime | int | None, default None 330 If provided, remove rows < `end`. 331 332 params: dict[str, Any] | None, default None 333 If provided, only remove rows which match the `params` filter. 334 335 Returns 336 ------- 337 A `SuccessTuple` indicating success. 338 """ 339 raise NotImplementedError
Delete rows within begin, end, and params.
Parameters
- pipe (mrsm.Pipe): The pipe whose rows to clear.
- begin (datetime | int | None, default None):
If provided, remove rows >=
begin. - end (datetime | int | None, default None):
If provided, remove rows <
end. - params (dict[str, Any] | None, default None):
If provided, only remove rows which match the
paramsfilter.
Returns
- A
SuccessTupleindicating success.
341def get_pipe_data( 342 self, 343 pipe: mrsm.Pipe, 344 select_columns: Optional[List[str]] = None, 345 omit_columns: Optional[List[str]] = None, 346 begin: Union[datetime, int, None] = None, 347 end: Union[datetime, int, None] = None, 348 params: Optional[Dict[str, Any]] = None, 349 debug: bool = False, 350 **kwargs: Any 351) -> Union['pd.DataFrame', None]: 352 """ 353 Query a pipe's target table and return the DataFrame. 354 355 Parameters 356 ---------- 357 pipe: mrsm.Pipe 358 The pipe with the target table from which to read. 359 360 select_columns: list[str] | None, default None 361 If provided, only select these given columns. 362 Otherwise select all available columns (i.e. `SELECT *`). 363 364 omit_columns: list[str] | None, default None 365 If provided, remove these columns from the selection. 366 367 begin: datetime | int | None, default None 368 The earliest `datetime` value to search from (inclusive). 369 370 end: datetime | int | None, default None 371 The lastest `datetime` value to search from (exclusive). 372 373 params: dict[str | str] | None, default None 374 Additional filters to apply to the query. 375 376 Returns 377 ------- 378 The target table's data as a DataFrame. 379 """ 380 if type(self).get_pipe_docs is get_pipe_docs: 381 raise NotImplementedError( 382 f"Missing `get_pipe_data()` or `get_pipe_docs()` for {type(self)}." 383 ) 384 385 docs = self.get_pipe_docs( 386 pipe=pipe, 387 select_columns=select_columns, 388 omit_columns=omit_columns, 389 begin=begin, 390 end=end, 391 params=params, 392 debug=debug, 393 **kwargs 394 ) 395 if not docs: 396 return None 397 398 pd = mrsm.attempt_import('pandas') 399 try: 400 return pd.DataFrame(docs) 401 except Exception as e: 402 from meerschaum.utils.warnings import warn 403 warn(f"Cannot build DataFrame from pipe docs:\n{e}") 404 405 return None
Query a pipe's target table and return the DataFrame.
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetimevalue to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetimevalue to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a DataFrame.
407def get_pipe_docs( 408 self, 409 pipe: mrsm.Pipe, 410 select_columns: Optional[List[str]] = None, 411 omit_columns: Optional[List[str]] = None, 412 begin: Union[datetime, int, None] = None, 413 end: Union[datetime, int, None] = None, 414 params: Optional[Dict[str, Any]] = None, 415 debug: bool = False, 416 **kwargs: Any 417) -> list[dict[str, Any]]: 418 """ 419 Return a pipe's data as a list of documents. 420 Defaults to `get_pipe_data().to_dict(orient='records')`. 421 422 Parameters 423 ---------- 424 pipe: mrsm.Pipe 425 The pipe with the target table from which to read. 426 427 select_columns: list[str] | None, default None 428 If provided, only select these given columns. 429 Otherwise select all available columns (i.e. `SELECT *`). 430 431 omit_columns: list[str] | None, default None 432 If provided, remove these columns from the selection. 433 434 begin: datetime | int | None, default None 435 The earliest `datetime` value to search from (inclusive). 436 437 end: datetime | int | None, default None 438 The lastest `datetime` value to search from (exclusive). 439 440 params: dict[str | str] | None, default None 441 Additional filters to apply to the query. 442 443 Returns 444 ------- 445 The target table's data as a list of dictionaries. 446 """ 447 df = self.get_pipe_data( 448 pipe=pipe, 449 select_columns=select_columns, 450 omit_columns=omit_columns, 451 begin=begin, 452 end=end, 453 params=params, 454 debug=debug, 455 **kwargs 456 ) 457 if df is None or df.empty: 458 return [] 459 return df.to_dict(orient='records')
Return a pipe's data as a list of documents.
Defaults to get_pipe_data().to_dict(orient='records').
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetimevalue to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetimevalue to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a list of dictionaries.
461@abc.abstractmethod 462def get_sync_time( 463 self, 464 pipe: mrsm.Pipe, 465 params: Optional[Dict[str, Any]] = None, 466 newest: bool = True, 467 debug: bool = False, 468 **kwargs: Any 469) -> datetime | int | None: 470 """ 471 Return the most recent value for the `datetime` axis. 472 473 Parameters 474 ---------- 475 pipe: mrsm.Pipe 476 The pipe whose collection contains documents. 477 478 params: dict[str, Any] | None, default None 479 Filter certain parameters when determining the sync time. 480 481 newest: bool, default True 482 If `True`, return the maximum value for the column. 483 484 Returns 485 ------- 486 The largest `datetime` or `int` value of the `datetime` axis. 487 """
Return the most recent value for the datetime axis.
Parameters
- pipe (mrsm.Pipe): The pipe whose collection contains documents.
- params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
- newest (bool, default True):
If
True, return the maximum value for the column.
Returns
- The largest
datetimeorintvalue of thedatetimeaxis.
489@abc.abstractmethod 490def get_pipe_columns_types( 491 self, 492 pipe: mrsm.Pipe, 493 debug: bool = False, 494 **kwargs: Any 495) -> Dict[str, str]: 496 """ 497 Return the data types for the columns in the target table for data type enforcement. 498 499 Parameters 500 ---------- 501 pipe: mrsm.Pipe 502 The pipe whose target table contains columns and data types. 503 504 Returns 505 ------- 506 A dictionary mapping columns to data types. 507 """
Return the data types for the columns in the target table for data type enforcement.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
- A dictionary mapping columns to data types.
509def get_pipe_columns_indices( 510 self, 511 debug: bool = False, 512) -> Dict[str, List[Dict[str, str]]]: 513 """ 514 Return a dictionary mapping columns to metadata about related indices. 515 516 Parameters 517 ---------- 518 pipe: mrsm.Pipe 519 The pipe whose target table has related indices. 520 521 Returns 522 ------- 523 A list of dictionaries with the keys "type" and "name". 524 525 Examples 526 -------- 527 >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']}) 528 >>> pipe.sync([{'color': 'red', 'size': 'M'}]) 529 >>> pipe.get_columns_indices() 530 {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]} 531 """ 532 return {}
Return a dictionary mapping columns to metadata about related indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
- A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
279def make_connector(cls, _is_executor: bool = False): 280 """ 281 Register a class as a `Connector`. 282 The `type` will be the lower case of the class name, without the suffix `connector`. 283 284 Parameters 285 ---------- 286 instance: bool, default False 287 If `True`, make this connector type an instance connector. 288 This requires implementing the various pipes functions and lots of testing. 289 290 Examples 291 -------- 292 >>> import meerschaum as mrsm 293 >>> from meerschaum.connectors import make_connector, Connector 294 >>> 295 >>> @make_connector 296 >>> class FooConnector(Connector): 297 ... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password'] 298 ... 299 >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat') 300 >>> print(conn.username, conn.password) 301 dog cat 302 >>> 303 """ 304 import re 305 from meerschaum.plugins import _get_parent_plugin 306 suffix_regex = ( 307 r'connector$' 308 if not _is_executor 309 else r'executor$' 310 ) 311 plugin_name = _get_parent_plugin(2) 312 typ = re.sub(suffix_regex, '', cls.__name__.lower()) 313 with _locks['types']: 314 types[typ] = cls 315 with _locks['custom_types']: 316 custom_types.add(typ) 317 if plugin_name: 318 with _locks['plugins_types']: 319 if plugin_name not in plugins_types: 320 plugins_types[plugin_name] = [] 321 plugins_types[plugin_name].append(typ) 322 with _locks['connectors']: 323 if typ not in connectors: 324 connectors[typ] = {} 325 if getattr(cls, 'IS_INSTANCE', False): 326 with _locks['instance_types']: 327 if typ not in instance_types: 328 instance_types.append(typ) 329 330 return cls
Register a class as a Connector.
The type will be the lower case of the class name, without the suffix connector.
Parameters
- instance (bool, default False):
If
True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>>
>>> @make_connector
>>> class FooConnector(Connector):
... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
...
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
53def entry( 54 sysargs: Union[List[str], str, None] = None, 55 _patch_args: Optional[Dict[str, Any]] = None, 56 _use_cli_daemon: bool = True, 57 _session_id: Optional[str] = None, 58) -> SuccessTuple: 59 """ 60 Parse arguments and launch a Meerschaum action. 61 62 Returns 63 ------- 64 A `SuccessTuple` indicating success. 65 """ 66 start = time.perf_counter() 67 from meerschaum.config.environment import get_daemon_env_vars 68 sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs 69 if ( 70 not _use_cli_daemon 71 or (not sysargs or (sysargs[0] and sysargs[0].startswith('-'))) 72 or '--no-daemon' in sysargs_list 73 or '--daemon' in sysargs_list 74 or '-d' in sysargs_list 75 or get_daemon_env_vars() 76 or not mrsm.get_config('system', 'experimental', 'cli_daemon') 77 ): 78 success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args) 79 end = time.perf_counter() 80 if '--debug' in sysargs_list: 81 print(f"Duration without daemon: {round(end - start, 3)}") 82 return success, msg 83 84 from meerschaum._internal.cli.entry import entry_with_daemon 85 success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args) 86 end = time.perf_counter() 87 if '--debug' in sysargs_list: 88 print(f"Duration with daemon: {round(end - start, 3)}") 89 return success, msg