meerschaum

Meerschaum Python API
Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.
Root Module
For your convenience, the following classes and functions may be imported from the root meerschaum namespace:
Classes
Examples
Build a Connector
Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:
import meerschaum as mrsm
sql_conn = mrsm.get_connector(
'sql:temp',
flavor='sqlite',
database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
# foo
# 0 1
sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
# foo
# 0 1
Create a Custom Connector Class
Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:
from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time
@mrsm.make_connector
class FooConnector(mrsm.Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']
def fetch(
self,
begin: datetime | None = None,
end: datetime | None = None,
):
now = begin or round_time(datetime.now(timezone.utc))
return [
{'ts': now, 'id': 1, 'vl': randint(1, 100)},
{'ts': now, 'id': 2, 'vl': randint(1, 100)},
{'ts': now, 'id': 3, 'vl': randint(1, 100)},
]
foo_conn = mrsm.get_connector(
'foo:bar',
username='foo',
password='bar',
)
docs = foo_conn.fetch()
Build a Pipe
Build a meerschaum.Pipe in-memory:
from datetime import datetime
import meerschaum as mrsm
pipe = mrsm.Pipe(
foo_conn, 'demo',
instance=sql_conn,
columns={'datetime': 'ts', 'id': 'id'},
tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
# ts id vl
# 0 2024-01-01 1 97
# 1 2024-01-01 2 18
# 2 2024-01-01 3 96
Add temporary=True to skip registering the pipe in the pipes table.
Query an Integer-Axis Pipe by Datetime
When a pipe's datetime axis is an integer epoch, set precision so datetime bounds can be translated to the axis's integer value. This lets you pass a datetime begin / end to meerschaum.Pipe.get_data() (and to actions like show data, clear, and deduplicate):
from datetime import datetime, timezone
import meerschaum as mrsm
pipe = mrsm.Pipe(
'demo', 'epoch',
instance='sql:temp',
columns={'datetime': 'ts'},
dtypes={'ts': 'int'},
precision='millisecond',
)
pipe.sync([{'ts': 1780099200000}])
### The datetime is translated to the epoch value `1780099200000`.
df = pipe.get_data(begin='2026-05-30')
Integer bounds (begin=1780099200000) still pass through unchanged. A datetime bound on a non-epoch integer axis (no precision set) raises a ValueError. Convert directly with meerschaum.utils.dtypes.datetime_to_int():
from datetime import datetime, timezone
from meerschaum.utils.dtypes import datetime_to_int
datetime_to_int(datetime(2026, 5, 30, tzinfo=timezone.utc), 'millisecond')
# 1780099200000
Get Registered Pipes
The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:
import meerschaum as mrsm
pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]
Add as_list=True to flatten the hierarchy:
import meerschaum as mrsm
pipes = mrsm.get_pipes(
tags=['production'],
instance=sql_conn,
as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]
Filter by the dtype of the datetime index column with datetime_dtypes. Accepted values are 'datetime', 'int', and 'None'; prefix with '_' to negate:
import meerschaum as mrsm
### Only pipes with a timestamp datetime index:
timestamp_pipes = mrsm.get_pipes(datetime_dtypes=['datetime'], as_list=True)
### Only pipes with an integer datetime index:
int_pipes = mrsm.get_pipes(datetime_dtypes=['int'], as_list=True)
### Exclude pipes without a datetime index:
datetime_pipes = mrsm.get_pipes(datetime_dtypes=['_None'], as_list=True)
Import Plugins
You can import a plugin's module through meerschaum.Plugin.module:
import meerschaum as mrsm
plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
noaa = plugin.module
If your plugin has submodules, use meerschaum.plugins.from_plugin_import:
from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')
Import multiple plugins with meerschaum.plugins.import_plugins:
from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')
Create a Job
Create a meerschaum.Job with name and sysargs:
import meerschaum as mrsm
job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()
Pass executor_keys as the connectors keys of an API instance to create a remote job:
import meerschaum as mrsm
job = mrsm.Job(
'foo',
'sync pipes -s daily',
executor_keys='api:main',
)
Import from a Virtual Environment
Use the meerschaum.Venv context manager to activate a virtual environment:
import meerschaum as mrsm
with mrsm.Venv('noaa'):
import requests
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
To import packages which may not be installed, use meerschaum.attempt_import():
import meerschaum as mrsm
requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py
Run Actions
Run sysargs with meerschaum.entry():
import meerschaum as mrsm
success, msg = mrsm.entry('show pipes + show version : x2')
Use meerschaum.actions.get_action() to access an action function directly:
from meerschaum.actions import get_action
show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])
Get a dictionary of available subactions with meerschaum.actions.get_subactions():
from meerschaum.actions import get_subactions
subactions = get_subactions('show')
success, msg = subactions['pipes']()
Create a Plugin
Run bootstrap plugin to create a new plugin:
mrsm bootstrap plugin example
This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.
Open your plugin with edit plugin:
mrsm edit plugin example
Run edit plugin and paste the example code below to try out the features.
See the writing plugins guide for more in-depth documentation.
Create a Custom Action
Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:
from meerschaum.actions import make_action
@make_action
def sing():
print('What would you like me to sing?')
return True, "Success"
def sing_tune():
return False, "I don't know that song!"
def sing_song():
print('Hello, World!')
return True, "Success"
Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:
from meerschaum.plugins import make_action, add_plugin_argument
add_plugin_argument(
'--song', type=str, help='What song to sing.',
)
@make_action
def sing_melody(action=None, song=None):
to_sing = action[0] if action else song
if not to_sing:
return False, "Please tell me what to sing!"
return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala
mrsm sing melody --song do-re-mi
Add a Page to the Web Dashboard
Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:
from meerschaum.plugins import dash_plugin, web_page
@dash_plugin
def init_dash(dash_app):
import dash.html as html
import dash_bootstrap_components as dbc
from dash import Input, Output, no_update
### Routes to '/dash/my-page'
@web_page('/my-page', login_required=False)
def my_page():
return dbc.Container([
html.H1("Hello, World!"),
dbc.Button("Click me", id='my-button'),
html.Div(id="my-output-div"),
])
@dash_app.callback(
Output('my-output-div', 'children'),
Input('my-button', 'n_clicks'),
)
def my_button_click(n_clicks):
if not n_clicks:
return no_update
return html.P(f'You clicked {n_clicks} times!')
Submodules
meerschaum.actions
Access functions for actions and subactions.
meerschaum.actions.actionsmeerschaum.actions.get_action()meerschaum.actions.get_completer()meerschaum.actions.get_main_action_name()meerschaum.actions.get_subactions()
meerschaum.config
Read and write the Meerschaum configuration registry.
meerschaum.config.get_config()meerschaum.config.get_plugin_config()meerschaum.config.write_config()meerschaum.config.write_plugin_config()meerschaum.config.environment
Patch configuration and connectors from environment variables.
meerschaum.config.environment.apply_environment_patches()meerschaum.config.environment.apply_environment_config()meerschaum.config.environment.apply_environment_uris()meerschaum.config.environment.apply_connector_uri()meerschaum.config.environment.get_connector_env_regex()meerschaum.config.environment.get_connector_env_vars()meerschaum.config.environment.get_env_vars()meerschaum.config.environment.get_daemon_env_vars()meerschaum.config.environment.replace_env()
meerschaum.connectors
Build connectors to interact with databases and fetch data.
meerschaum.connectors.get_connector()meerschaum.connectors.make_connector()meerschaum.connectors.is_connected()meerschaum.connectors.poll.retry_connect()meerschaum.connectors.Connectormeerschaum.connectors.sql.SQLConnectormeerschaum.connectors.api.APIConnectormeerschaum.connectors.valkey.ValkeyConnector
meerschaum.jobs
Start background jobs.
meerschaum.jobs.Jobmeerschaum.jobs.Executormeerschaum.jobs.systemd.SystemdExecutormeerschaum.jobs.get_jobs()meerschaum.jobs.get_filtered_jobs()meerschaum.jobs.get_running_jobs()meerschaum.jobs.get_stopped_jobs()meerschaum.jobs.get_paused_jobs()meerschaum.jobs.get_restart_jobs()meerschaum.jobs.make_executor()meerschaum.jobs.check_restart_jobs()meerschaum.jobs.start_check_jobs_thread()meerschaum.jobs.stop_check_jobs_thread()meerschaum.jobs.get_executor_keys_from_context()
meerschaum.plugins
Access plugin modules and other API utilties.
meerschaum.plugins.Pluginmeerschaum.plugins.api_plugin()meerschaum.plugins.dash_plugin()meerschaum.plugins.import_plugins()meerschaum.plugins.reload_plugins()meerschaum.plugins.get_plugins()meerschaum.plugins.get_data_plugins()meerschaum.plugins.add_plugin_argument()meerschaum.plugins.pre_sync_hook()meerschaum.plugins.post_sync_hook()
meerschaum.utils
Utility functions are available in several submodules:
meerschaum.utils.daemon.daemon_entry()meerschaum.utils.daemon.daemon_action()meerschaum.utils.daemon.get_daemons()meerschaum.utils.daemon.get_daemon_ids()meerschaum.utils.daemon.get_running_daemons()meerschaum.utils.daemon.get_paused_daemons()meerschaum.utils.daemon.get_stopped_daemons()meerschaum.utils.daemon.get_filtered_daemons()meerschaum.utils.daemon.run_daemon()meerschaum.utils.daemon.Daemonmeerschaum.utils.daemon.FileDescriptorInterceptormeerschaum.utils.daemon.RotatingFile
meerschaum.utils.daemon
Manage background jobs.
meerschaum.utils.dataframe.add_missing_cols_to_df()meerschaum.utils.dataframe.chunksize_to_npartitions()meerschaum.utils.dataframe.df_from_literal()meerschaum.utils.dataframe.df_is_chunk_generator()meerschaum.utils.dataframe.enforce_dtypes()meerschaum.utils.dataframe.filter_unseen_df()meerschaum.utils.dataframe.get_bool_cols()meerschaum.utils.dataframe.get_bytes_cols()meerschaum.utils.dataframe.get_datetime_bound_from_df()meerschaum.utils.dataframe.get_date_cols()meerschaum.utils.dataframe.get_datetime_cols()meerschaum.utils.dataframe.get_datetime_cols_types()meerschaum.utils.dataframe.get_first_valid_dask_partition()meerschaum.utils.dataframe.get_geometry_cols()meerschaum.utils.dataframe.get_geometry_cols_types()meerschaum.utils.dataframe.get_json_cols()meerschaum.utils.dataframe.get_numeric_cols()meerschaum.utils.dataframe.get_special_cols()meerschaum.utils.dataframe.get_unhashable_cols()meerschaum.utils.dataframe.get_unique_index_values()meerschaum.utils.dataframe.get_uuid_cols()meerschaum.utils.dataframe.parse_df_datetimes()meerschaum.utils.dataframe.query_df()meerschaum.utils.dataframe.to_json()meerschaum.utils.dataframe.to_simple_lines()meerschaum.utils.dataframe.parse_simple_lines()
meerschaum.utils.dataframe
Manipulate dataframes.
meerschaum.utils.dtypes.are_dtypes_equal()meerschaum.utils.dtypes.attempt_cast_to_bytes()meerschaum.utils.dtypes.attempt_cast_to_geometry()meerschaum.utils.dtypes.attempt_cast_to_numeric()meerschaum.utils.dtypes.attempt_cast_to_uuid()meerschaum.utils.dtypes.coerce_timezone()meerschaum.utils.dtypes.datetime_to_int()meerschaum.utils.dtypes.deserialize_base64()meerschaum.utils.dtypes.deserialize_bytes_string()meerschaum.utils.dtypes.deserialize_geometry()meerschaum.utils.dtypes.encode_bytes_for_bytea()meerschaum.utils.dtypes.geometry_is_gpkg()meerschaum.utils.dtypes.geometry_is_wkt()meerschaum.utils.dtypes.get_current_timestamp()meerschaum.utils.dtypes.get_geometry_type_srid()meerschaum.utils.dtypes.is_dtype_numeric()meerschaum.utils.dtypes.is_dtype_special()meerschaum.utils.dtypes.json_serialize_value()meerschaum.utils.dtypes.none_if_null()meerschaum.utils.dtypes.project_geometry()meerschaum.utils.dtypes.quantize_decimal()meerschaum.utils.dtypes.serialize_bytes()meerschaum.utils.dtypes.serialize_datetime()meerschaum.utils.dtypes.serialize_date()meerschaum.utils.dtypes.serialize_decimal()meerschaum.utils.dtypes.serialize_geometry()meerschaum.utils.dtypes.to_datetime()meerschaum.utils.dtypes.to_pandas_dtype()meerschaum.utils.dtypes.value_is_null()meerschaum.utils.dtypes.get_next_precision_unit()meerschaum.utils.dtypes.round_time()
meerschaum.utils.dtypes
Work with data types.
meerschaum.utils.formatting.colored()meerschaum.utils.formatting.extract_stats_from_message()meerschaum.utils.formatting.fill_ansi()meerschaum.utils.formatting.format_bytes()meerschaum.utils.formatting.format_dataframe()meerschaum.utils.formatting.get_console()meerschaum.utils.formatting.highlight_pipes()meerschaum.utils.formatting.make_header()meerschaum.utils.formatting.pipe_repr()meerschaum.utils.formatting.pprint()meerschaum.utils.formatting.pprint_df()meerschaum.utils.formatting.pprint_pipes()meerschaum.utils.formatting.print_options()meerschaum.utils.formatting.print_pipes_results()meerschaum.utils.formatting.print_tuple()meerschaum.utils.formatting.translate_rich_to_termcolor()
meerschaum.utils.formatting
Format output text.
meerschaum.utils.misc.items_str()meerschaum.utils.misc.is_int()meerschaum.utils.misc.is_uuid()meerschaum.utils.misc.interval_str()meerschaum.utils.misc.filter_keywords()meerschaum.utils.misc.generate_password()meerschaum.utils.misc.string_to_dict()meerschaum.utils.misc.iterate_chunks()meerschaum.utils.misc.timed_input()meerschaum.utils.misc.replace_pipes_in_dict()meerschaum.utils.misc.is_valid_email()meerschaum.utils.misc.string_width()meerschaum.utils.misc.replace_password()meerschaum.utils.misc.parse_config_substitution()meerschaum.utils.misc.edit_file()meerschaum.utils.misc.get_in_ex_params()meerschaum.utils.misc.separate_negation_values()meerschaum.utils.misc.flatten_list()meerschaum.utils.misc.make_symlink()meerschaum.utils.misc.is_symlink()meerschaum.utils.misc.wget()meerschaum.utils.misc.add_method_to_class()meerschaum.utils.misc.is_pipe_registered()meerschaum.utils.misc.get_cols_lines()meerschaum.utils.misc.sorted_dict()meerschaum.utils.misc.flatten_pipes_dict()meerschaum.utils.misc.dict_from_od()meerschaum.utils.misc.remove_ansi()meerschaum.utils.misc.get_connector_labels()meerschaum.utils.misc.json_serialize_datetime()meerschaum.utils.misc.async_wrap()meerschaum.utils.misc.is_docker_available()meerschaum.utils.misc.is_android()meerschaum.utils.misc.is_bcp_available()meerschaum.utils.misc.truncate_string_sections()meerschaum.utils.misc.safely_extract_tar()meerschaum.utils.misc.get_directory_size()
meerschaum.utils.misc
Miscellaneous utility functions.
meerschaum.utils.packages.attempt_import()meerschaum.utils.packages.get_module_path()meerschaum.utils.packages.manually_import_module()meerschaum.utils.packages.get_install_no_version()meerschaum.utils.packages.determine_version()meerschaum.utils.packages.need_update()meerschaum.utils.packages.get_pip()meerschaum.utils.packages.pip_install()meerschaum.utils.packages.pip_uninstall()meerschaum.utils.packages.completely_uninstall_package()meerschaum.utils.packages.run_python_package()meerschaum.utils.packages.lazy_import()meerschaum.utils.packages.pandas_name()meerschaum.utils.packages.import_pandas()meerschaum.utils.packages.import_rich()meerschaum.utils.packages.import_dcc()meerschaum.utils.packages.import_html()meerschaum.utils.packages.get_modules_from_package()meerschaum.utils.packages.import_children()meerschaum.utils.packages.reload_package()meerschaum.utils.packages.reload_meerschaum()meerschaum.utils.packages.is_installed()meerschaum.utils.packages.venv_contains_package()meerschaum.utils.packages.package_venv()meerschaum.utils.packages.ensure_readline()meerschaum.utils.packages.get_prerelease_dependencies()
meerschaum.utils.packages
Manage Python packages.
meerschaum.utils.pipes
Utilities for working with pipe objects.
meerschaum.utils.sql.build_where()meerschaum.utils.sql.clean()meerschaum.utils.sql.get_sqlalchemy_table()meerschaum.utils.sql.dateadd_str()meerschaum.utils.sql.test_connection()meerschaum.utils.sql.get_distinct_col_count()meerschaum.utils.sql.sql_item_name()meerschaum.utils.sql.pg_capital()meerschaum.utils.sql.oracle_capital()meerschaum.utils.sql.truncate_item_name()meerschaum.utils.sql.table_exists()meerschaum.utils.sql.get_table_cols_types()meerschaum.utils.sql.get_table_cols_indices()meerschaum.utils.sql.get_update_queries()meerschaum.utils.sql.get_null_replacement()meerschaum.utils.sql.get_db_version()meerschaum.utils.sql.get_rename_table_queries()meerschaum.utils.sql.get_create_table_queries()meerschaum.utils.sql.wrap_query_with_cte()meerschaum.utils.sql.format_cte_subquery()meerschaum.utils.sql.session_execute()meerschaum.utils.sql.get_reset_autoincrement_queries()meerschaum.utils.sql.get_postgis_geo_columns_types()meerschaum.utils.sql.get_create_schema_if_not_exists_queries()
meerschaum.utils.sql
Build SQL queries.
meerschaum.utils.threading
Manage threads and process-wide stop signals.
meerschaum.utils.venv.Venvmeerschaum.utils.venv.activate_venv()meerschaum.utils.venv.deactivate_venv()meerschaum.utils.venv.get_module_venv()meerschaum.utils.venv.get_venvs()meerschaum.utils.venv.init_venv()meerschaum.utils.venv.inside_venv()meerschaum.utils.venv.is_venv_active()meerschaum.utils.venv.venv_exec()meerschaum.utils.venv.venv_executable()meerschaum.utils.venv.venv_exists()meerschaum.utils.venv.venv_target_path()meerschaum.utils.venv.verify_venv()
meerschaum.utils.venv
Manage virtual environments.
meerschaum.utils.warnings
Print warnings, errors, info, and debug messages.
1#! /usr/bin/env python 2# -*- coding: utf-8 -*- 3# vim:fenc=utf-8 4 5""" 6Copyright 2020–2026 Bennett Meares 7 8Licensed under the Apache License, Version 2.0 (the "License"); 9you may not use this file except in compliance with the License. 10You may obtain a copy of the License at 11 12 http://www.apache.org/licenses/LICENSE-2.0 13 14Unless required by applicable law or agreed to in writing, software 15distributed under the License is distributed on an "AS IS" BASIS, 16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17See the License for the specific language governing permissions and 18limitations under the License. 19""" 20 21import atexit 22 23from meerschaum.utils.typing import SuccessTuple 24from meerschaum.utils.packages import attempt_import 25from meerschaum.core.Pipe import Pipe 26from meerschaum.plugins import Plugin 27from meerschaum.utils.venv import Venv 28from meerschaum.jobs import Job, make_executor 29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector 30from meerschaum.utils import get_pipes 31from meerschaum.utils.formatting import pprint 32from meerschaum._internal.docs import index as __doc__ 33from meerschaum.config import __version__, get_config 34from meerschaum._internal.entry import entry 35from meerschaum.__main__ import _close_pools 36 37atexit.register(_close_pools) 38 39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False} 40__all__ = ( 41 "get_pipes", 42 "get_connector", 43 "get_config", 44 "Pipe", 45 "Plugin", 46 "SuccessTuple", 47 "Venv", 48 "Plugin", 49 "Job", 50 "pprint", 51 "attempt_import", 52 "actions", 53 "config", 54 "connectors", 55 "jobs", 56 "plugins", 57 "utils", 58 "SuccessTuple", 59 "Connector", 60 "InstanceConnector", 61 "make_connector", 62 "entry", 63)
29def get_pipes( 30 connector_keys: Union[str, List[str], None] = None, 31 metric_keys: Union[str, List[str], None] = None, 32 location_keys: Union[str, List[str], None] = None, 33 tags: Optional[List[str]] = None, 34 targets: Optional[List[str]] = None, 35 datetime_dtypes: Optional[List[str]] = None, 36 params: Optional[Dict[str, Any]] = None, 37 mrsm_instance: Union[str, InstanceConnector, None] = None, 38 instance: Union[str, InstanceConnector, None] = None, 39 as_list: bool = False, 40 as_tags_dict: bool = False, 41 as_targets_dict: bool = False, 42 method: str = 'registered', 43 workers: Optional[int] = None, 44 debug: bool = False, 45 _cache_parameters: bool = True, 46 **kw: Any 47) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]: 48 """ 49 Return a dictionary or list of `meerschaum.Pipe` objects. 50 51 Parameters 52 ---------- 53 connector_keys: Union[str, List[str], None], default None 54 String or list of connector keys. 55 If omitted or is `'*'`, fetch all possible keys. 56 If a string begins with `'_'`, select keys that do NOT match the string. 57 58 metric_keys: Union[str, List[str], None], default None 59 String or list of metric keys. See `connector_keys` for formatting. 60 61 location_keys: Union[str, List[str], None], default None 62 String or list of location keys. See `connector_keys` for formatting. 63 64 tags: Optional[List[str]], default None 65 If provided, only include pipes with these tags. 66 67 datetime_dtypes: Optional[List[str]], default None 68 If provided, only include pipes with the corresponding `datetime` axis dtypes. 69 Accepted values are `datetime`, `int`, `None` (or `null`, etc.). 70 May be negated by `_`. 71 72 params: Optional[Dict[str, Any]], default None 73 Dictionary of additional parameters to search by. 74 Params are parsed into a SQL WHERE clause. 75 E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'` 76 77 mrsm_instance: Union[str, InstanceConnector, None], default None 78 Connector keys for the Meerschaum instance of the pipes. 79 Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or 80 `meerschaum.connectors.api.APIConnector.APIConnector`. 81 82 as_list: bool, default False 83 If `True`, return pipes in a list instead of a hierarchical dictionary. 84 `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}` 85 `True` : `[Pipe]` 86 87 as_tags_dict: bool, default False 88 If `True`, return a dictionary mapping tags to pipes. 89 Pipes with multiple tags will be repeated. 90 91 as_targets_dict: bool, default False 92 If `True`, return a dictionary mapping `(schema, target)` tuples to pipes. 93 Pipes sharing the same target across different schemata are grouped separately. 94 95 method: str, default 'registered' 96 Available options: `['registered', 'explicit', 'all']` 97 If `'registered'` (default), create pipes based on registered keys in the connector's pipes table 98 (API or SQL connector, depends on mrsm_instance). 99 If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys 100 instead of consulting the pipes table. Useful for creating non-existent pipes. 101 If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`. 102 **NOTE:** Method `'all'` is not implemented! 103 104 workers: Optional[int], default None 105 If provided (and `as_tags_dict` or `as_targets_dict` is `True`), set the number of workers 106 for the pool to fetch tags or targets. 107 Only takes effect if the instance connector supports multi-threading. 108 109 **kw: Any: 110 Keyword arguments to pass to the `meerschaum.Pipe` constructor. 111 112 Returns 113 ------- 114 A dictionary of dictionaries and `meerschaum.Pipe` objects 115 in the connector, metric, location hierarchy. 116 If `as_list` is `True`, return a list of `meerschaum.Pipe` objects. 117 If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes. 118 If `as_targets_dict` is `True`, return a dictionary mapping targets to pipes. 119 120 Examples 121 -------- 122 ``` 123 >>> ### Manual definition: 124 >>> pipes = { 125 ... <connector_keys>: { 126 ... <metric_key>: { 127 ... <location_key>: Pipe( 128 ... <connector_keys>, 129 ... <metric_key>, 130 ... <location_key>, 131 ... ), 132 ... }, 133 ... }, 134 ... }, 135 >>> ### Accessing a single pipe: 136 >>> pipes['sql:main']['weather'][None] 137 >>> ### Return a list instead: 138 >>> get_pipes(as_list=True) 139 [Pipe('sql:main', 'weather')] 140 >>> get_pipes(as_tags_dict=True) 141 {'gvl': Pipe('sql:main', 'weather')} 142 ``` 143 """ 144 import json 145 from collections import defaultdict 146 from meerschaum.config import get_config 147 from meerschaum.config.static import STATIC_CONFIG 148 from meerschaum.utils.warnings import error 149 from meerschaum.utils.misc import filter_keywords, separate_negation_values 150 from meerschaum.utils.pool import get_pool 151 from meerschaum.utils.pipes import replace_pipes_syntax 152 from meerschaum.utils.debug import dprint 153 from meerschaum.utils.dtypes import value_is_null, get_current_timestamp 154 from meerschaum import Pipe 155 156 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 157 if datetime_dtypes: 158 if isinstance(datetime_dtypes, str): 159 datetime_dtypes = [datetime_dtypes] 160 for _dt in datetime_dtypes: 161 _clean = str(_dt).lstrip(negation_prefix).lower() 162 if _clean not in ('datetime', 'int') and not value_is_null(_clean): 163 error(f"Invalid datetime dtype '{_dt}'.") 164 165 if connector_keys is None: 166 connector_keys = [] 167 if metric_keys is None: 168 metric_keys = [] 169 if location_keys is None: 170 location_keys = [] 171 if params is None: 172 params = {} 173 if tags is None: 174 tags = [] 175 176 if isinstance(connector_keys, str): 177 connector_keys = [connector_keys] 178 if isinstance(metric_keys, str): 179 metric_keys = [metric_keys] 180 if isinstance(location_keys, str): 181 location_keys = [location_keys] 182 183 ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`). 184 if mrsm_instance is None: 185 mrsm_instance = instance 186 if mrsm_instance is None: 187 mrsm_instance = get_config('meerschaum', 'instance', patch=True) 188 if isinstance(mrsm_instance, str): 189 from meerschaum.connectors.parse import parse_instance_keys 190 connector = parse_instance_keys(keys=mrsm_instance, debug=debug) 191 else: 192 from meerschaum.connectors import instance_types 193 valid_connector = False 194 if hasattr(mrsm_instance, 'type'): 195 if mrsm_instance.type in instance_types: 196 valid_connector = True 197 if not valid_connector: 198 error(f"Invalid instance connector: {mrsm_instance}") 199 connector = mrsm_instance 200 if debug: 201 dprint(f"Using instance connector: {connector}") 202 if not connector: 203 error(f"Could not create connector from keys: '{mrsm_instance}'") 204 205 ### Get a list of tuples for the keys needed to build pipes. 206 result = fetch_pipes_keys( 207 method, 208 connector, 209 connector_keys = connector_keys, 210 metric_keys = metric_keys, 211 location_keys = location_keys, 212 tags = tags, 213 params = params, 214 workers = workers, 215 debug = debug 216 ) 217 if result is None: 218 error("Unable to build pipes!") 219 result_items: List[Tuple] = ( 220 list(result.items()) 221 if isinstance(result, dict) 222 else [(None, keys_tuple) for keys_tuple in result] 223 ) 224 225 ### Populate the `pipes` dictionary with Pipes based on the keys 226 ### obtained from the chosen `method`. 227 in_dtypes, ex_dtypes = separate_negation_values(datetime_dtypes or []) 228 in_targets, ex_targets = separate_negation_values(targets or []) 229 pipes: PipesDict = {} 230 targets_pipes: Dict[Tuple[Optional[str], str], List[mrsm.Pipe]] = defaultdict(lambda: []) 231 connector_schema = getattr(connector, 'schema', None) 232 connector_is_sql = getattr(connector, 'type', None) == 'sql' 233 connector_flavor = getattr(connector, 'flavor', None) 234 for pipe_id, keys_tuple in result_items: 235 ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2] 236 pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None 237 pipe_parameters = ( 238 pipe_tags_or_parameters 239 if isinstance(pipe_tags_or_parameters, (dict, str)) 240 else None 241 ) 242 if isinstance(pipe_parameters, str): 243 pipe_parameters = json.loads(pipe_parameters) 244 pipe_tags = ( 245 pipe_tags_or_parameters 246 if isinstance(pipe_tags_or_parameters, list) 247 else ( 248 pipe_tags_or_parameters.get('tags', []) 249 if isinstance(pipe_tags_or_parameters, dict) 250 else None 251 ) 252 ) 253 254 pipe = Pipe( 255 ck, mk, lk, 256 mrsm_instance = connector, 257 parameters = pipe_parameters, 258 tags = pipe_tags, 259 debug = debug, 260 **filter_keywords(Pipe, **kw) 261 ) 262 pipe.__dict__['_tags'] = pipe_tags 263 if pipe_id is not None: 264 pipe._cache_value('_id', pipe_id, memory_only=True, debug=debug) 265 if pipe_parameters is not None: 266 now = get_current_timestamp('ms', as_int=True) / 1000 267 full_attributes = { 268 'connector_keys': ck, 269 'metric_key': mk, 270 'location_key': lk, 271 'parameters': pipe_parameters, 272 } 273 if pipe_id is not None: 274 full_attributes['pipe_id'] = pipe_id 275 pipe._cache_value('attributes', full_attributes, memory_only=True, debug=debug) 276 pipe._cache_value('_attributes_sync_time', now, memory_only=True, debug=debug) 277 278 if datetime_dtypes or targets: 279 parameters_str = str(pipe_parameters) 280 if pipe_parameters is None or 'MRSM{' in parameters_str or 'Pipe(' in parameters_str: 281 pipe_parameters = pipe.get_parameters(debug=debug) 282 283 keep_pipe = True 284 285 if datetime_dtypes: 286 columns_val = (pipe_parameters or {}).get('columns', {}) or {} 287 dt_col = columns_val.get('datetime', None) 288 pipe_dtypes = ( 289 ((pipe_parameters or {}).get('dtypes', None) or {}) 290 if dt_col 291 else None 292 ) 293 dt_typ = pipe_dtypes.get(dt_col, None) if dt_col else None 294 295 def _dtype_matches(clean_d): 296 if not dt_col: 297 return value_is_null(clean_d) 298 return ( 299 (clean_d == 'int' and 'int' in str(dt_typ).lower()) 300 or 301 (clean_d == 'datetime' and 'int' not in str(dt_typ).lower()) 302 ) 303 304 in_match = not in_dtypes or any(_dtype_matches(d) for d in in_dtypes) 305 ex_match = bool(ex_dtypes and any(_dtype_matches(d) for d in ex_dtypes)) 306 keep_pipe = keep_pipe and in_match and not ex_match 307 if not keep_pipe: 308 continue 309 310 if targets: 311 pipe_target = pipe.target 312 in_target_match = not in_targets or any(t == pipe_target for t in in_targets) 313 ex_target_match = bool(ex_targets and any(t == pipe_target for t in ex_targets)) 314 keep_pipe = keep_pipe and in_target_match and not ex_target_match 315 if not keep_pipe: 316 continue 317 318 if ck not in pipes: 319 pipes[ck] = {} 320 321 if mk not in pipes[ck]: 322 pipes[ck][mk] = {} 323 324 325 pipes[ck][mk][lk] = pipe 326 327 if as_targets_dict: 328 raw_params = pipe_parameters if isinstance(pipe_parameters, dict) else {} 329 schema = raw_params.get('schema') or connector_schema 330 explicit_target = ( 331 raw_params.get('target') 332 or raw_params.get('target_name') 333 or raw_params.get('target_table') 334 or raw_params.get('target_table_name') 335 ) 336 if explicit_target: 337 target_name = ( 338 replace_pipes_syntax(explicit_target, _pipe=pipe) 339 if isinstance(explicit_target, str) and '{{' in explicit_target 340 else explicit_target 341 ) 342 else: 343 target_name = pipe._target_legacy() 344 if connector_is_sql and connector_flavor: 345 from meerschaum.utils.sql import truncate_item_name 346 target_name = truncate_item_name(target_name, connector_flavor) 347 targets_pipes[(schema, target_name)].append(pipe) 348 349 if not as_list and not as_tags_dict and not as_targets_dict: 350 return pipes 351 352 from meerschaum.utils.pipes import flatten_pipes_dict 353 pipes_list = flatten_pipes_dict(pipes) 354 if as_list: 355 return pipes_list 356 357 pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1)) 358 359 def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]: 360 _tags = pipe.__dict__.get('_tags', None) 361 gathered_tags = _tags if _tags is not None else pipe.tags 362 return pipe, (gathered_tags or []) 363 364 if as_tags_dict: 365 tags_pipes = defaultdict(lambda: []) 366 pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list)) 367 for pipe, tags in pipes_tags.items(): 368 for tag in (tags or []): 369 tags_pipes[tag].append(pipe) 370 371 return dict(tags_pipes) 372 373 if as_targets_dict: 374 return dict(targets_pipes) 375 376 raise NotImplementedError("No futher options for returning pipes.")
Return a dictionary or list of meerschaum.Pipe objects.
Parameters
- connector_keys (Union[str, List[str], None], default None):
String or list of connector keys.
If omitted or is
'*', fetch all possible keys. If a string begins with'_', select keys that do NOT match the string. - metric_keys (Union[str, List[str], None], default None):
String or list of metric keys. See
connector_keysfor formatting. - location_keys (Union[str, List[str], None], default None):
String or list of location keys. See
connector_keysfor formatting. - tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
- datetime_dtypes (Optional[List[str]], default None):
If provided, only include pipes with the corresponding
datetimeaxis dtypes. Accepted values aredatetime,int,None(ornull, etc.). May be negated by_. - params (Optional[Dict[str, Any]], default None):
Dictionary of additional parameters to search by.
Params are parsed into a SQL WHERE clause.
E.g.
{'a': 1, 'b': 2}equates to'WHERE a = 1 AND b = 2' - mrsm_instance (Union[str, InstanceConnector, None], default None):
Connector keys for the Meerschaum instance of the pipes.
Must be a
meerschaum.connectors.sql.SQLConnector.SQLConnectorormeerschaum.connectors.api.APIConnector.APIConnector. - as_list (bool, default False):
If
True, return pipes in a list instead of a hierarchical dictionary.False:{connector_keys: {metric_key: {location_key: Pipe}}}True:[Pipe] - as_tags_dict (bool, default False):
If
True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated. - as_targets_dict (bool, default False):
If
True, return a dictionary mapping(schema, target)tuples to pipes. Pipes sharing the same target across different schemata are grouped separately. - method (str, default 'registered'):
Available options:
['registered', 'explicit', 'all']If'registered'(default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If'all', create pipes from predefined metrics and locations. Requiredconnector_keys. NOTE: Method'all'is not implemented! - workers (Optional[int], default None):
If provided (and
as_tags_dictoras_targets_dictisTrue), set the number of workers for the pool to fetch tags or targets. Only takes effect if the instance connector supports multi-threading. - **kw (Any:):
Keyword arguments to pass to the
meerschaum.Pipeconstructor.
Returns
- A dictionary of dictionaries and
meerschaum.Pipeobjects - in the connector, metric, location hierarchy.
- If
as_listisTrue, return a list ofmeerschaum.Pipeobjects. - If
as_tags_dictisTrue, return a dictionary mapping tags to pipes. - If
as_targets_dictisTrue, return a dictionary mapping targets to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
... <connector_keys>: {
... <metric_key>: {
... <location_key>: Pipe(
... <connector_keys>,
... <metric_key>,
... <location_key>,
... ),
... },
... },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
68def get_connector( 69 type: str = None, 70 label: str = None, 71 refresh: bool = False, 72 debug: bool = False, 73 _load_plugins: bool = True, 74 **kw: Any 75) -> Connector: 76 """ 77 Return existing connector or create new connection and store for reuse. 78 79 You can create new connectors if enough parameters are provided for the given type and flavor. 80 81 Parameters 82 ---------- 83 type: Optional[str], default None 84 Connector type (sql, api, etc.). 85 Defaults to the type of the configured `instance_connector`. 86 87 label: Optional[str], default None 88 Connector label (e.g. main). Defaults to `'main'`. 89 90 refresh: bool, default False 91 Refresh the Connector instance / construct new object. Defaults to `False`. 92 93 kw: Any 94 Other arguments to pass to the Connector constructor. 95 If the Connector has already been constructed and new arguments are provided, 96 `refresh` is set to `True` and the old Connector is replaced. 97 98 Returns 99 ------- 100 A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`, 101 `meerschaum.connectors.sql.SQLConnector`). 102 103 Examples 104 -------- 105 The following parameters would create a new 106 `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file. 107 108 ``` 109 >>> conn = get_connector( 110 ... type = 'sql', 111 ... label = 'newlabel', 112 ... flavor = 'sqlite', 113 ... database = '/file/path/to/database.db' 114 ... ) 115 >>> 116 ``` 117 118 """ 119 from meerschaum.connectors.parse import parse_instance_keys 120 from meerschaum.config import get_config 121 from meerschaum._internal.static import STATIC_CONFIG 122 from meerschaum.utils.warnings import warn 123 global _loaded_plugin_connectors 124 if isinstance(type, str) and not label and ':' in type: 125 type, label = type.split(':', maxsplit=1) 126 127 if _load_plugins: 128 with _locks['_loaded_plugin_connectors']: 129 if not _loaded_plugin_connectors: 130 load_plugin_connectors() 131 _load_builtin_custom_connectors() 132 _loaded_plugin_connectors = True 133 134 if type is None and label is None: 135 default_instance_keys = get_config('meerschaum', 'instance', patch=True) 136 ### recursive call to get_connector 137 return parse_instance_keys(default_instance_keys) 138 139 ### NOTE: the default instance connector may not be main. 140 ### Only fall back to 'main' if the type is provided by the label is omitted. 141 label = label if label is not None else STATIC_CONFIG['connectors']['default_label'] 142 143 ### type might actually be a label. Check if so and raise a warning. 144 if type not in connectors: 145 possibilities, poss_msg = [], "" 146 for _type in get_config('meerschaum', 'connectors'): 147 if type in get_config('meerschaum', 'connectors', _type): 148 possibilities.append(f"{_type}:{type}") 149 if len(possibilities) > 0: 150 poss_msg = " Did you mean" 151 for poss in possibilities[:-1]: 152 poss_msg += f" '{poss}'," 153 if poss_msg.endswith(','): 154 poss_msg = poss_msg[:-1] 155 if len(possibilities) > 1: 156 poss_msg += " or" 157 poss_msg += f" '{possibilities[-1]}'?" 158 159 warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False) 160 return None 161 162 if 'sql' not in types: 163 from meerschaum.connectors.plugin import PluginConnector 164 from meerschaum.connectors.valkey import ValkeyConnector 165 with _locks['types']: 166 types.update({ 167 'api': APIConnector, 168 'sql': SQLConnector, 169 'plugin': PluginConnector, 170 'valkey': ValkeyConnector, 171 }) 172 173 ### determine if we need to call the constructor 174 if not refresh: 175 ### see if any user-supplied arguments differ from the existing instance 176 if label in connectors[type]: 177 warning_message = None 178 for attribute, value in kw.items(): 179 if attribute not in connectors[type][label].meta: 180 import inspect 181 cls = connectors[type][label].__class__ 182 cls_init_signature = inspect.signature(cls) 183 cls_init_params = cls_init_signature.parameters 184 if attribute not in cls_init_params: 185 warning_message = ( 186 f"Received new attribute '{attribute}' not present in connector " + 187 f"{connectors[type][label]}.\n" 188 ) 189 elif connectors[type][label].__dict__[attribute] != value: 190 warning_message = ( 191 f"Mismatched values for attribute '{attribute}' in connector " 192 + f"'{connectors[type][label]}'.\n" + 193 f" - Keyword value: '{value}'\n" + 194 f" - Existing value: '{connectors[type][label].__dict__[attribute]}'\n" 195 ) 196 if warning_message is not None: 197 warning_message += ( 198 "\nSetting `refresh` to True and recreating connector with type:" 199 + f" '{type}' and label '{label}'." 200 ) 201 refresh = True 202 warn(warning_message) 203 else: ### connector doesn't yet exist 204 refresh = True 205 206 ### only create an object if refresh is True 207 ### (can be manually specified, otherwise determined above) 208 if refresh: 209 with _locks['connectors']: 210 try: 211 ### will raise an error if configuration is incorrect / missing 212 conn = types[type](label=label, **kw) 213 connectors[type][label] = conn 214 except InvalidAttributesError as ie: 215 warn( 216 f"Incorrect attributes for connector '{type}:{label}'.\n" 217 + str(ie), 218 stack = False, 219 ) 220 conn = None 221 except Exception as e: 222 from meerschaum.utils.formatting import get_console 223 console = get_console() 224 if console: 225 console.print_exception() 226 warn( 227 f"Exception when creating connector '{type}:{label}'.\n" + str(e), 228 stack = False, 229 ) 230 conn = None 231 if conn is None: 232 return None 233 234 return connectors[type][label]
Return existing connector or create new connection and store for reuse.
You can create new connectors if enough parameters are provided for the given type and flavor.
Parameters
- type (Optional[str], default None):
Connector type (sql, api, etc.).
Defaults to the type of the configured
instance_connector. - label (Optional[str], default None):
Connector label (e.g. main). Defaults to
'main'. - refresh (bool, default False):
Refresh the Connector instance / construct new object. Defaults to
False. - kw (Any):
Other arguments to pass to the Connector constructor.
If the Connector has already been constructed and new arguments are provided,
refreshis set toTrueand the old Connector is replaced.
Returns
- A new Meerschaum connector (e.g.
meerschaum.connectors.api.APIConnector, meerschaum.connectors.sql.SQLConnector).
Examples
The following parameters would create a new
meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.
>>> conn = get_connector(
... type = 'sql',
... label = 'newlabel',
... flavor = 'sqlite',
... database = '/file/path/to/database.db'
... )
>>>
112def get_config( 113 *keys: str, 114 patch: bool = True, 115 substitute: bool = True, 116 sync_files: bool = True, 117 write_missing: bool = True, 118 as_tuple: bool = False, 119 warn: bool = True, 120 debug: bool = False 121) -> Any: 122 """ 123 Return the Meerschaum configuration dictionary. 124 If positional arguments are provided, index by the keys. 125 Raises a warning if invalid keys are provided. 126 127 Parameters 128 ---------- 129 keys: str: 130 List of strings to index. 131 132 patch: bool, default True 133 If `True`, patch missing default keys into the config directory. 134 Defaults to `True`. 135 136 sync_files: bool, default True 137 If `True`, sync files if needed. 138 Defaults to `True`. 139 140 write_missing: bool, default True 141 If `True`, write default values when the main config files are missing. 142 Defaults to `True`. 143 144 substitute: bool, default True 145 If `True`, subsitute 'MRSM{}' values. 146 Defaults to `True`. 147 148 as_tuple: bool, default False 149 If `True`, return a tuple of type (success, value). 150 Defaults to `False`. 151 152 Returns 153 ------- 154 The value in the configuration directory, indexed by the provided keys. 155 156 Examples 157 -------- 158 >>> get_config('meerschaum', 'instance') 159 'sql:main' 160 >>> get_config('does', 'not', 'exist') 161 UserWarning: Invalid keys in config: ('does', 'not', 'exist') 162 """ 163 import json 164 165 symlinks_key = STATIC_CONFIG['config']['symlinks_key'] 166 if debug: 167 from meerschaum.utils.debug import dprint 168 dprint(f"Indexing keys: {keys}", color=False) 169 170 if len(keys) == 0: 171 _rc = _config( 172 substitute=substitute, 173 sync_files=sync_files, 174 write_missing=(write_missing and _allow_write_missing), 175 ) 176 if as_tuple: 177 return True, _rc 178 return _rc 179 180 ### Weird threading issues, only import if substitute is True. 181 if substitute: 182 from meerschaum.config._read_config import search_and_substitute_config 183 ### Invalidate the cache if it was read before with substitute=False 184 ### but there still exist substitutions. 185 if ( 186 config is not None and substitute and keys[0] != symlinks_key 187 and 'MRSM{' in json.dumps(config.get(keys[0])) 188 ): 189 try: 190 _subbed = search_and_substitute_config({keys[0]: config[keys[0]]}) 191 except Exception: 192 import traceback 193 traceback.print_exc() 194 _subbed = {keys[0]: config[keys[0]]} 195 196 config[keys[0]] = _subbed[keys[0]] 197 if symlinks_key in _subbed: 198 if symlinks_key not in config: 199 config[symlinks_key] = {} 200 config[symlinks_key] = apply_patch_to_config( 201 _subbed.get(symlinks_key, {}), 202 config.get(symlinks_key, {}), 203 ) 204 205 from meerschaum.config._sync import sync_files as _sync_files 206 if config is None: 207 _config(*keys, sync_files=sync_files) 208 209 invalid_keys = False 210 if keys[0] not in config and keys[0] != symlinks_key: 211 single_key_config = read_config( 212 keys=[keys[0]], substitute=substitute, write_missing=write_missing 213 ) 214 if keys[0] not in single_key_config: 215 invalid_keys = True 216 else: 217 config[keys[0]] = single_key_config.get(keys[0], None) 218 if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]: 219 if symlinks_key not in config: 220 config[symlinks_key] = {} 221 config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]] 222 223 if sync_files: 224 _sync_files(keys=[keys[0]]) 225 226 c = config 227 if len(keys) > 0: 228 for k in keys: 229 try: 230 c = c[k] 231 except Exception: 232 invalid_keys = True 233 break 234 if invalid_keys: 235 ### Check if the keys are in the default configuration. 236 from meerschaum.config._default import default_config 237 in_default = True 238 patched_default_config = ( 239 search_and_substitute_config(default_config) 240 if substitute else copy.deepcopy(default_config) 241 ) 242 _c = patched_default_config 243 for k in keys: 244 try: 245 _c = _c[k] 246 except Exception: 247 in_default = False 248 if in_default: 249 c = _c 250 invalid_keys = False 251 warning_msg = f"Invalid keys in config: {keys}" 252 if not in_default: 253 try: 254 if warn: 255 from meerschaum.utils.warnings import warn as _warn 256 _warn(warning_msg, stacklevel=3, color=False) 257 except Exception: 258 if warn: 259 print(warning_msg) 260 if as_tuple: 261 return False, None 262 return None 263 264 ### Don't write keys that we haven't yet loaded into memory. 265 not_loaded_keys = [k for k in patched_default_config if k not in config] 266 for k in not_loaded_keys: 267 patched_default_config.pop(k, None) 268 269 set_config( 270 apply_patch_to_config( 271 patched_default_config, 272 config, 273 ) 274 ) 275 if patch and keys[0] != symlinks_key and write_missing: 276 ### Only persist defaults when the key's file is genuinely absent. 277 ### Never overwrite an existing file (e.g. one that failed to parse) ─ 278 ### doing so would clobber the user's config with default values. 279 ### Brand-new config files are still created by `read_config()`. 280 from meerschaum.config._read_config import get_keyfile_path 281 keyfile_exists = get_keyfile_path(keys[0], create_new=False) is not None 282 if not keyfile_exists: 283 write_config(config, debug=debug) 284 285 if as_tuple: 286 return (not invalid_keys), c 287 return c
Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.
Parameters
- keys (str:): List of strings to index.
- patch (bool, default True):
If
True, patch missing default keys into the config directory. Defaults toTrue. - sync_files (bool, default True):
If
True, sync files if needed. Defaults toTrue. - write_missing (bool, default True):
If
True, write default values when the main config files are missing. Defaults toTrue. - substitute (bool, default True):
If
True, subsitute 'MRSM{}' values. Defaults toTrue. - as_tuple (bool, default False):
If
True, return a tuple of type (success, value). Defaults toFalse.
Returns
- The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
66class Pipe: 67 """ 68 Access Meerschaum pipes via Pipe objects. 69 70 Pipes are identified by the following: 71 72 1. Connector keys (e.g. `'sql:main'`) 73 2. Metric key (e.g. `'weather'`) 74 3. Location (optional; e.g. `None`) 75 76 A pipe's connector keys correspond to a data source, and when the pipe is synced, 77 its `fetch` definition is evaluated and executed to produce new data. 78 79 Alternatively, new data may be directly synced via `pipe.sync()`: 80 81 ``` 82 >>> from meerschaum import Pipe 83 >>> pipe = Pipe('csv', 'weather') 84 >>> 85 >>> import pandas as pd 86 >>> df = pd.read_csv('weather.csv') 87 >>> pipe.sync(df) 88 ``` 89 """ 90 91 from ._fetch import ( 92 fetch, 93 get_backtrack_interval, 94 ) 95 from ._data import ( 96 get_data, 97 get_backtrack_data, 98 get_rowcount, 99 get_size, 100 get_data, 101 get_doc, 102 get_docs, 103 get_value, 104 _get_data_as_iterator, 105 get_chunk_interval, 106 get_chunk_bounds, 107 get_chunk_bounds_batches, 108 parse_date_bounds, 109 ) 110 from ._register import register 111 from ._attributes import ( 112 attributes, 113 parameters, 114 columns, 115 indices, 116 indexes, 117 dtypes, 118 autoincrement, 119 autotime, 120 upsert, 121 static, 122 tzinfo, 123 enforce, 124 null_indices, 125 mixed_numerics, 126 get_columns, 127 get_columns_types, 128 get_columns_indices, 129 get_indices, 130 get_parameters, 131 get_dtypes, 132 update_parameters, 133 tags, 134 get_id, 135 id, 136 get_val_column, 137 parents, 138 parent, 139 children, 140 child, 141 reference, 142 references, 143 target, 144 _target_legacy, 145 guess_datetime, 146 precision, 147 get_precision, 148 ) 149 from ._cache import ( 150 _get_cache_connector, 151 _cache_value, 152 _get_cached_value, 153 _invalidate_cache, 154 _get_cache_dir_path, 155 _write_cache_key, 156 _write_cache_file, 157 _write_cache_conn_key, 158 _read_cache_key, 159 _read_cache_file, 160 _read_cache_conn_key, 161 _load_cache_keys, 162 _load_cache_files, 163 _load_cache_conn_keys, 164 _get_cache_keys, 165 _get_cache_file_keys, 166 _get_cache_conn_keys, 167 _clear_cache_key, 168 _clear_cache_file, 169 _clear_cache_conn_key, 170 ) 171 from ._show import show 172 from ._edit import edit, edit_definition, update 173 from ._sync import ( 174 sync, 175 get_sync_time, 176 exists, 177 filter_existing, 178 _get_chunk_label, 179 get_num_workers, 180 _persist_new_special_columns, 181 ) 182 from ._verify import ( 183 verify, 184 get_bound_interval, 185 get_bound_time, 186 ) 187 from ._delete import delete 188 from ._drop import drop, drop_indices 189 from ._compress import compress, decompress 190 from ._maintenance import vacuum, analyze, repartition 191 from ._index import create_indices 192 from ._clear import clear 193 from ._deduplicate import deduplicate 194 from ._bootstrap import bootstrap 195 from ._dtypes import enforce_dtypes, infer_dtypes 196 from ._copy import copy_to 197 198 def __init__( 199 self, 200 connector: str = '', 201 metric: str = '', 202 location: Optional[str] = None, 203 parameters: Optional[Dict[str, Any]] = None, 204 columns: Union[Dict[str, str], List[str], None] = None, 205 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 206 tags: Optional[List[str]] = None, 207 target: Optional[str] = None, 208 dtypes: Optional[Dict[str, str]] = None, 209 instance: Optional[Union[str, InstanceConnector]] = None, 210 upsert: Optional[bool] = None, 211 autoincrement: Optional[bool] = None, 212 autotime: Optional[bool] = None, 213 precision: Union[str, Dict[str, Union[str, int]], None] = None, 214 static: Optional[bool] = None, 215 enforce: Optional[bool] = None, 216 null_indices: Optional[bool] = None, 217 mixed_numerics: Optional[bool] = None, 218 compress: Union[bool, Dict[str, Any], None] = None, 219 temporary: bool = False, 220 cache: Optional[bool] = None, 221 cache_connector_keys: Optional[str] = None, 222 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 223 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 224 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 225 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 226 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 227 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 228 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 229 connector_keys: Optional[str] = None, 230 metric_key: Optional[str] = None, 231 location_key: Optional[str] = None, 232 instance_keys: Optional[str] = None, 233 indexes: Union[Dict[str, str], List[str], None] = None, 234 debug: bool = False, 235 ): 236 """ 237 Parameters 238 ---------- 239 connector: str 240 Keys for the pipe's source connector, e.g. `'sql:main'`. 241 242 metric: str 243 Label for the pipe's contents, e.g. `'weather'`. 244 245 location: str, default None 246 Label for the pipe's location. Defaults to `None`. 247 248 parameters: Optional[Dict[str, Any]], default None 249 Optionally set a pipe's parameters from the constructor, 250 e.g. columns and other attributes. 251 You can edit these parameters with `edit pipes`. 252 253 columns: Union[Dict[str, str], List[str], None], default None 254 Set the `columns` dictionary of `parameters`. 255 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 256 257 indices: Optional[Dict[str, Union[str, List[str]]]], default None 258 Set the `indices` dictionary of `parameters`. 259 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 260 261 tags: Optional[List[str]], default None 262 A list of strings to be added under the `'tags'` key of `parameters`. 263 You can select pipes with certain tags using `--tags`. 264 265 dtypes: Optional[Dict[str, str]], default None 266 Set the `dtypes` dictionary of `parameters`. 267 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 268 269 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 270 Connector for the Meerschaum instance where the pipe resides. 271 Defaults to the preconfigured default instance (`'sql:main'`). 272 273 instance: Optional[Union[str, InstanceConnector]], default None 274 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 275 276 upsert: Optional[bool], default None 277 If `True`, set `upsert` to `True` in the parameters. 278 279 autoincrement: Optional[bool], default None 280 If `True`, set `autoincrement` in the parameters. 281 282 autotime: Optional[bool], default None 283 If `True`, set `autotime` in the parameters. 284 285 precision: Union[str, Dict[str, Union[str, int]], None], default None 286 If provided, set `precision` in the parameters. 287 This may be either a string (the precision unit) or a dictionary of in the form 288 `{'unit': <unit>, 'interval': <interval>}`. 289 Default is determined by the `datetime` column dtype 290 (e.g. `datetime64[us]` is `microsecond` precision). 291 292 static: Optional[bool], default None 293 If `True`, set `static` in the parameters. 294 295 enforce: Optional[bool], default None 296 If `False`, skip data type enforcement. 297 Default behavior is `True`. 298 299 null_indices: Optional[bool], default None 300 Set to `False` if there will be no null values in the index columns. 301 Defaults to `True`. 302 303 mixed_numerics: bool, default None 304 If `True`, integer columns will be converted to `numeric` when floats are synced. 305 Set to `False` to disable this behavior. 306 Defaults to `True`. 307 308 compress: Union[bool, Dict[str, Any], None], default None 309 If `True` (or a dictionary of compression settings), mark the pipe for compression. 310 For TimescaleDB hypertables, a columnstore (compression) policy is installed 311 automatically on sync. A dictionary may override `segmentby`, `orderby`, and `after`. 312 Defaults to `False`. 313 314 hypercore: bool, default True 315 For TimescaleDB hypertables, enable the Hypercore columnstore at table creation 316 (declaring `segmentby`/`orderby` in `CREATE TABLE`), which causes TimescaleDB to 317 auto-create a columnstore policy. Set to `False` for a plain row-store hypertable. 318 Has no effect unless the pipe is a hypertable (`hypertable`, default `True`). 319 320 temporary: bool, default False 321 If `True`, prevent instance tables (pipes, users, plugins) from being created. 322 323 cache: Optional[bool], default None 324 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 325 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 326 Defaults to `True` (from `None`). 327 328 cache_connector_keys: Optional[str], default None 329 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 330 331 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 332 If provided, inherit the parameters of the reference Pipe(s). 333 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 334 a Pipe itself, or a list of any of these values. 335 336 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 337 Set references for parent pipes. See `references` for values. 338 339 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 340 Set references for child pipes. See `references` for values. 341 342 """ 343 from meerschaum.utils.warnings import error, warn 344 if (not connector and not connector_keys) or (not metric and not metric_key): 345 error( 346 "Please provide strings for the connector and metric\n " 347 + "(first two positional arguments)." 348 ) 349 350 ### Fall back to legacy `location_key` just in case. 351 if not location: 352 location = location_key 353 354 if not connector: 355 connector = connector_keys 356 357 if not metric: 358 metric = metric_key 359 360 if location in ('[None]', 'None'): 361 location = None 362 363 from meerschaum._internal.static import STATIC_CONFIG 364 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 365 for k in (connector, metric, location, *(tags or [])): 366 if str(k).startswith(negation_prefix): 367 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 368 369 self._connector_keys = str(connector) 370 self._connector_key = self.connector_keys ### Alias 371 self._metric_key = metric 372 self._location_key = location 373 self.temporary = temporary 374 self.cache = ( 375 cache 376 if cache is not None 377 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 378 ) 379 self.cache_connector_keys = ( 380 str(cache_connector_keys) 381 if cache_connector_keys is not None 382 else None 383 ) 384 self.debug = debug 385 386 self._attributes: Dict[str, Any] = { 387 'connector_keys': self._connector_keys, 388 'metric_key': self._metric_key, 389 'location_key': self._location_key, 390 'parameters': {}, 391 } 392 393 ### only set parameters if values are provided 394 if isinstance(parameters, dict): 395 self._attributes['parameters'] = parameters 396 else: 397 if parameters is not None: 398 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 399 self._attributes['parameters'] = {} 400 401 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 402 if isinstance(columns, (list, tuple)): 403 columns = {str(col): str(col) for col in columns} 404 if isinstance(columns, dict): 405 self._attributes['parameters']['columns'] = columns 406 elif isinstance(columns, str) and 'Pipe(' in columns: 407 pass 408 elif columns is not None: 409 warn(f"The provided columns are of invalid type '{type(columns)}'.") 410 411 indices = ( 412 indices 413 or indexes 414 or self._attributes.get('parameters', {}).get('indices', None) 415 or self._attributes.get('parameters', {}).get('indexes', None) 416 ) 417 if isinstance(indices, dict): 418 indices_key = ( 419 'indexes' 420 if 'indexes' in self._attributes['parameters'] 421 else 'indices' 422 ) 423 self._attributes['parameters'][indices_key] = indices 424 425 if isinstance(tags, (list, tuple)): 426 self._attributes['parameters']['tags'] = tags 427 elif tags is not None: 428 warn(f"The provided tags are of invalid type '{type(tags)}'.") 429 430 if isinstance(target, str): 431 self._attributes['parameters']['target'] = target 432 elif target is not None: 433 warn(f"The provided target is of invalid type '{type(target)}'.") 434 435 if isinstance(dtypes, dict): 436 self._attributes['parameters']['dtypes'] = dtypes 437 elif dtypes is not None: 438 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 439 440 if isinstance(upsert, bool): 441 self._attributes['parameters']['upsert'] = upsert 442 443 if isinstance(autoincrement, bool): 444 self._attributes['parameters']['autoincrement'] = autoincrement 445 446 if isinstance(autotime, bool): 447 self._attributes['parameters']['autotime'] = autotime 448 449 if isinstance(precision, dict): 450 self._attributes['parameters']['precision'] = precision 451 elif isinstance(precision, str): 452 self._attributes['parameters']['precision'] = {'unit': precision} 453 454 if isinstance(static, bool): 455 self._attributes['parameters']['static'] = static 456 self._static = static 457 458 if isinstance(enforce, bool): 459 self._attributes['parameters']['enforce'] = enforce 460 461 if isinstance(null_indices, bool): 462 self._attributes['parameters']['null_indices'] = null_indices 463 464 if isinstance(mixed_numerics, bool): 465 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 466 467 if isinstance(compress, (bool, dict)): 468 self._attributes['parameters']['compress'] = compress 469 470 ### NOTE: The parameters dictionary is {} by default. 471 ### A Pipe may be registered without parameters, then edited, 472 ### or a Pipe may be registered with parameters set in-memory first. 473 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 474 if _mrsm_instance is None: 475 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 476 477 if not isinstance(_mrsm_instance, str): 478 self._instance_connector = _mrsm_instance 479 self._instance_keys = str(_mrsm_instance) 480 else: 481 self._instance_keys = _mrsm_instance 482 483 if self._instance_keys == 'sql:memory': 484 self.cache = False 485 486 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 487 488 if references is not None or reference is not None: 489 reference_vals = references if references is not None else reference 490 self.references = reference_vals 491 492 if parents is not None or parent is not None: 493 parent_vals = parents if parents is not None else parent 494 self.parents = parent_vals 495 496 if children is not None or child is not None: 497 children_vals = children if children is not None else child 498 self.children = children_vals 499 500 @property 501 def metric_key(self) -> str: 502 """ 503 Return the pipe's metric key. 504 """ 505 return self._metric_key 506 507 @property 508 def metric(self) -> str: 509 """ 510 Return the pipe's metric key. 511 """ 512 return self._metric_key 513 514 @property 515 def location_key(self) -> Union[str, None]: 516 """ 517 Return the pipe's location key. 518 """ 519 return self._location_key 520 521 @property 522 def location(self) -> Union[str, None]: 523 """ 524 Return the pipe's location key. 525 """ 526 return self._location_key 527 528 @property 529 def meta(self): 530 """ 531 Return the four keys needed to reconstruct this pipe. 532 """ 533 return { 534 'connector_keys': self.connector_keys, 535 'metric_key': self.metric_key, 536 'location_key': self.location_key, 537 'instance_keys': self.instance_keys, 538 } 539 540 def keys(self) -> List[str]: 541 """ 542 Return the ordered keys for this pipe. 543 """ 544 return { 545 key: val 546 for key, val in self.meta.items() 547 if key != 'instance' 548 } 549 550 @property 551 def instance_keys(self) -> str: 552 """ 553 Return the pipe's instance keys. 554 """ 555 return self._instance_keys 556 557 @property 558 def instance(self) -> Union[InstanceConnector, str]: 559 """ 560 Return the pipe's instance connector or keys. 561 """ 562 conn = self.instance_connector 563 if conn is None: 564 return self.instance_keys 565 return conn 566 567 @property 568 def instance_connector(self) -> Union[InstanceConnector, None]: 569 """ 570 The instance connector on which this pipe resides. 571 """ 572 if '_instance_connector' not in self.__dict__: 573 from meerschaum.connectors.parse import parse_instance_keys 574 conn = parse_instance_keys(self.instance_keys) 575 if conn: 576 self._instance_connector = conn 577 else: 578 return None 579 return self._instance_connector 580 581 @property 582 def connector_keys(self) -> str: 583 """ 584 Return the pipe's connector keys. 585 """ 586 return self._connector_keys 587 588 @property 589 def connector_key(self) -> str: 590 """ 591 Legacy: use `Pipe.connector_keys` instead. 592 """ 593 return self.connector_keys 594 595 @property 596 def connector(self) -> Union['Connector', str]: 597 """ 598 The connector to the data source. 599 """ 600 if '_connector' not in self.__dict__: 601 from meerschaum.connectors.parse import parse_instance_keys 602 import warnings 603 with warnings.catch_warnings(): 604 warnings.simplefilter('ignore') 605 try: 606 conn = parse_instance_keys(self.connector_keys) 607 except Exception: 608 conn = None 609 if conn: 610 self._connector = conn 611 else: 612 return self._connector_keys 613 return self._connector 614 615 def __str__(self, ansi: bool=False): 616 return pipe_repr(self, ansi=ansi) 617 618 def __eq__(self, other): 619 try: 620 return ( 621 isinstance(self, type(other)) 622 and self.connector_keys == other.connector_keys 623 and self.metric_key == other.metric_key 624 and self.location_key == other.location_key 625 and self.instance_keys == other.instance_keys 626 ) 627 except Exception: 628 return False 629 630 def __hash__(self): 631 ### Using an esoteric separator to avoid collisions. 632 sep = "[\"']" 633 return hash( 634 str(self.connector_keys) + sep 635 + str(self.metric_key) + sep 636 + str(self.location_key) + sep 637 + str(self.instance_keys) + sep 638 ) 639 640 def __repr__(self, ansi: bool=True, **kw) -> str: 641 if not hasattr(sys, 'ps1'): 642 ansi = False 643 644 return pipe_repr(self, ansi=ansi, **kw) 645 646 def __pt_repr__(self): 647 from meerschaum.utils.packages import attempt_import 648 prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False) 649 return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True)) 650 651 def __getstate__(self) -> Dict[str, Any]: 652 """ 653 Define the state dictionary (pickling). 654 """ 655 return { 656 'connector_keys': self.connector_keys, 657 'metric_key': self.metric_key, 658 'location_key': self.location_key, 659 'parameters': self._attributes.get('parameters', None), 660 'instance_keys': self.instance_keys, 661 } 662 663 def __setstate__(self, _state: Dict[str, Any]): 664 """ 665 Read the state (unpickling). 666 """ 667 self.__init__(**_state) 668 669 def __getitem__(self, key: str) -> Any: 670 """ 671 Index the pipe's attributes. 672 If the `key` cannot be found`, return `None`. 673 """ 674 if key in self.attributes: 675 return self.attributes.get(key, None) 676 677 aliases = { 678 'connector': 'connector_keys', 679 'connector_key': 'connector_keys', 680 'metric': 'metric_key', 681 'location': 'location_key', 682 } 683 aliased_key = aliases.get(key, None) 684 if aliased_key is not None: 685 return self.attributes.get(aliased_key, None) 686 687 property_aliases = { 688 'instance': 'instance_keys', 689 'instance_key': 'instance_keys', 690 } 691 aliased_key = property_aliases.get(key, None) 692 if aliased_key is not None: 693 key = aliased_key 694 return getattr(self, key, None) 695 696 def __copy__(self): 697 """ 698 Return a shallow copy of the current pipe. 699 """ 700 return mrsm.Pipe( 701 self.connector_keys, self.metric_key, self.location_key, 702 instance=self.instance_keys, 703 parameters=self._attributes.get('parameters', None), 704 ) 705 706 def __deepcopy__(self, memo): 707 """ 708 Return a deep copy of the current pipe. 709 """ 710 return self.__copy__()
Access Meerschaum pipes via Pipe objects.
Pipes are identified by the following:
- Connector keys (e.g.
'sql:main') - Metric key (e.g.
'weather') - Location (optional; e.g.
None)
A pipe's connector keys correspond to a data source, and when the pipe is synced,
its fetch definition is evaluated and executed to produce new data.
Alternatively, new data may be directly synced via pipe.sync():
>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
198 def __init__( 199 self, 200 connector: str = '', 201 metric: str = '', 202 location: Optional[str] = None, 203 parameters: Optional[Dict[str, Any]] = None, 204 columns: Union[Dict[str, str], List[str], None] = None, 205 indices: Optional[Dict[str, Union[str, List[str]]]] = None, 206 tags: Optional[List[str]] = None, 207 target: Optional[str] = None, 208 dtypes: Optional[Dict[str, str]] = None, 209 instance: Optional[Union[str, InstanceConnector]] = None, 210 upsert: Optional[bool] = None, 211 autoincrement: Optional[bool] = None, 212 autotime: Optional[bool] = None, 213 precision: Union[str, Dict[str, Union[str, int]], None] = None, 214 static: Optional[bool] = None, 215 enforce: Optional[bool] = None, 216 null_indices: Optional[bool] = None, 217 mixed_numerics: Optional[bool] = None, 218 compress: Union[bool, Dict[str, Any], None] = None, 219 temporary: bool = False, 220 cache: Optional[bool] = None, 221 cache_connector_keys: Optional[str] = None, 222 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 223 reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 224 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 225 parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 226 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None, 227 child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None, 228 mrsm_instance: Optional[Union[str, InstanceConnector]] = None, 229 connector_keys: Optional[str] = None, 230 metric_key: Optional[str] = None, 231 location_key: Optional[str] = None, 232 instance_keys: Optional[str] = None, 233 indexes: Union[Dict[str, str], List[str], None] = None, 234 debug: bool = False, 235 ): 236 """ 237 Parameters 238 ---------- 239 connector: str 240 Keys for the pipe's source connector, e.g. `'sql:main'`. 241 242 metric: str 243 Label for the pipe's contents, e.g. `'weather'`. 244 245 location: str, default None 246 Label for the pipe's location. Defaults to `None`. 247 248 parameters: Optional[Dict[str, Any]], default None 249 Optionally set a pipe's parameters from the constructor, 250 e.g. columns and other attributes. 251 You can edit these parameters with `edit pipes`. 252 253 columns: Union[Dict[str, str], List[str], None], default None 254 Set the `columns` dictionary of `parameters`. 255 If `parameters` is also provided, this dictionary is added under the `'columns'` key. 256 257 indices: Optional[Dict[str, Union[str, List[str]]]], default None 258 Set the `indices` dictionary of `parameters`. 259 If `parameters` is also provided, this dictionary is added under the `'indices'` key. 260 261 tags: Optional[List[str]], default None 262 A list of strings to be added under the `'tags'` key of `parameters`. 263 You can select pipes with certain tags using `--tags`. 264 265 dtypes: Optional[Dict[str, str]], default None 266 Set the `dtypes` dictionary of `parameters`. 267 If `parameters` is also provided, this dictionary is added under the `'dtypes'` key. 268 269 mrsm_instance: Optional[Union[str, InstanceConnector]], default None 270 Connector for the Meerschaum instance where the pipe resides. 271 Defaults to the preconfigured default instance (`'sql:main'`). 272 273 instance: Optional[Union[str, InstanceConnector]], default None 274 Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored. 275 276 upsert: Optional[bool], default None 277 If `True`, set `upsert` to `True` in the parameters. 278 279 autoincrement: Optional[bool], default None 280 If `True`, set `autoincrement` in the parameters. 281 282 autotime: Optional[bool], default None 283 If `True`, set `autotime` in the parameters. 284 285 precision: Union[str, Dict[str, Union[str, int]], None], default None 286 If provided, set `precision` in the parameters. 287 This may be either a string (the precision unit) or a dictionary of in the form 288 `{'unit': <unit>, 'interval': <interval>}`. 289 Default is determined by the `datetime` column dtype 290 (e.g. `datetime64[us]` is `microsecond` precision). 291 292 static: Optional[bool], default None 293 If `True`, set `static` in the parameters. 294 295 enforce: Optional[bool], default None 296 If `False`, skip data type enforcement. 297 Default behavior is `True`. 298 299 null_indices: Optional[bool], default None 300 Set to `False` if there will be no null values in the index columns. 301 Defaults to `True`. 302 303 mixed_numerics: bool, default None 304 If `True`, integer columns will be converted to `numeric` when floats are synced. 305 Set to `False` to disable this behavior. 306 Defaults to `True`. 307 308 compress: Union[bool, Dict[str, Any], None], default None 309 If `True` (or a dictionary of compression settings), mark the pipe for compression. 310 For TimescaleDB hypertables, a columnstore (compression) policy is installed 311 automatically on sync. A dictionary may override `segmentby`, `orderby`, and `after`. 312 Defaults to `False`. 313 314 hypercore: bool, default True 315 For TimescaleDB hypertables, enable the Hypercore columnstore at table creation 316 (declaring `segmentby`/`orderby` in `CREATE TABLE`), which causes TimescaleDB to 317 auto-create a columnstore policy. Set to `False` for a plain row-store hypertable. 318 Has no effect unless the pipe is a hypertable (`hypertable`, default `True`). 319 320 temporary: bool, default False 321 If `True`, prevent instance tables (pipes, users, plugins) from being created. 322 323 cache: Optional[bool], default None 324 If `True`, cache the pipe's metadata to disk (in addition to in-memory caching). 325 If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`. 326 Defaults to `True` (from `None`). 327 328 cache_connector_keys: Optional[str], default None 329 If provided, use the keys to a Valkey connector (e.g. `valkey:main`). 330 331 references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 332 If provided, inherit the parameters of the reference Pipe(s). 333 May be equal to a string of the Pipe constructor, a dictionary of constructor keys, 334 a Pipe itself, or a list of any of these values. 335 336 parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 337 Set references for parent pipes. See `references` for values. 338 339 children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None 340 Set references for child pipes. See `references` for values. 341 342 """ 343 from meerschaum.utils.warnings import error, warn 344 if (not connector and not connector_keys) or (not metric and not metric_key): 345 error( 346 "Please provide strings for the connector and metric\n " 347 + "(first two positional arguments)." 348 ) 349 350 ### Fall back to legacy `location_key` just in case. 351 if not location: 352 location = location_key 353 354 if not connector: 355 connector = connector_keys 356 357 if not metric: 358 metric = metric_key 359 360 if location in ('[None]', 'None'): 361 location = None 362 363 from meerschaum._internal.static import STATIC_CONFIG 364 negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix'] 365 for k in (connector, metric, location, *(tags or [])): 366 if str(k).startswith(negation_prefix): 367 error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.") 368 369 self._connector_keys = str(connector) 370 self._connector_key = self.connector_keys ### Alias 371 self._metric_key = metric 372 self._location_key = location 373 self.temporary = temporary 374 self.cache = ( 375 cache 376 if cache is not None 377 else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False)) 378 ) 379 self.cache_connector_keys = ( 380 str(cache_connector_keys) 381 if cache_connector_keys is not None 382 else None 383 ) 384 self.debug = debug 385 386 self._attributes: Dict[str, Any] = { 387 'connector_keys': self._connector_keys, 388 'metric_key': self._metric_key, 389 'location_key': self._location_key, 390 'parameters': {}, 391 } 392 393 ### only set parameters if values are provided 394 if isinstance(parameters, dict): 395 self._attributes['parameters'] = parameters 396 else: 397 if parameters is not None: 398 warn(f"The provided parameters are of invalid type '{type(parameters)}'.") 399 self._attributes['parameters'] = {} 400 401 columns = columns or self._attributes.get('parameters', {}).get('columns', None) 402 if isinstance(columns, (list, tuple)): 403 columns = {str(col): str(col) for col in columns} 404 if isinstance(columns, dict): 405 self._attributes['parameters']['columns'] = columns 406 elif isinstance(columns, str) and 'Pipe(' in columns: 407 pass 408 elif columns is not None: 409 warn(f"The provided columns are of invalid type '{type(columns)}'.") 410 411 indices = ( 412 indices 413 or indexes 414 or self._attributes.get('parameters', {}).get('indices', None) 415 or self._attributes.get('parameters', {}).get('indexes', None) 416 ) 417 if isinstance(indices, dict): 418 indices_key = ( 419 'indexes' 420 if 'indexes' in self._attributes['parameters'] 421 else 'indices' 422 ) 423 self._attributes['parameters'][indices_key] = indices 424 425 if isinstance(tags, (list, tuple)): 426 self._attributes['parameters']['tags'] = tags 427 elif tags is not None: 428 warn(f"The provided tags are of invalid type '{type(tags)}'.") 429 430 if isinstance(target, str): 431 self._attributes['parameters']['target'] = target 432 elif target is not None: 433 warn(f"The provided target is of invalid type '{type(target)}'.") 434 435 if isinstance(dtypes, dict): 436 self._attributes['parameters']['dtypes'] = dtypes 437 elif dtypes is not None: 438 warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.") 439 440 if isinstance(upsert, bool): 441 self._attributes['parameters']['upsert'] = upsert 442 443 if isinstance(autoincrement, bool): 444 self._attributes['parameters']['autoincrement'] = autoincrement 445 446 if isinstance(autotime, bool): 447 self._attributes['parameters']['autotime'] = autotime 448 449 if isinstance(precision, dict): 450 self._attributes['parameters']['precision'] = precision 451 elif isinstance(precision, str): 452 self._attributes['parameters']['precision'] = {'unit': precision} 453 454 if isinstance(static, bool): 455 self._attributes['parameters']['static'] = static 456 self._static = static 457 458 if isinstance(enforce, bool): 459 self._attributes['parameters']['enforce'] = enforce 460 461 if isinstance(null_indices, bool): 462 self._attributes['parameters']['null_indices'] = null_indices 463 464 if isinstance(mixed_numerics, bool): 465 self._attributes['parameters']['mixed_numerics'] = mixed_numerics 466 467 if isinstance(compress, (bool, dict)): 468 self._attributes['parameters']['compress'] = compress 469 470 ### NOTE: The parameters dictionary is {} by default. 471 ### A Pipe may be registered without parameters, then edited, 472 ### or a Pipe may be registered with parameters set in-memory first. 473 _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys) 474 if _mrsm_instance is None: 475 _mrsm_instance = get_config('meerschaum', 'instance', patch=True) 476 477 if not isinstance(_mrsm_instance, str): 478 self._instance_connector = _mrsm_instance 479 self._instance_keys = str(_mrsm_instance) 480 else: 481 self._instance_keys = _mrsm_instance 482 483 if self._instance_keys == 'sql:memory': 484 self.cache = False 485 486 self._cache_locks = collections.defaultdict(lambda: threading.RLock()) 487 488 if references is not None or reference is not None: 489 reference_vals = references if references is not None else reference 490 self.references = reference_vals 491 492 if parents is not None or parent is not None: 493 parent_vals = parents if parents is not None else parent 494 self.parents = parent_vals 495 496 if children is not None or child is not None: 497 children_vals = children if children is not None else child 498 self.children = children_vals
Parameters
- connector (str):
Keys for the pipe's source connector, e.g.
'sql:main'. - metric (str):
Label for the pipe's contents, e.g.
'weather'. - location (str, default None):
Label for the pipe's location. Defaults to
None. - parameters (Optional[Dict[str, Any]], default None):
Optionally set a pipe's parameters from the constructor,
e.g. columns and other attributes.
You can edit these parameters with
edit pipes. - columns (Union[Dict[str, str], List[str], None], default None):
Set the
columnsdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'columns'key. - indices (Optional[Dict[str, Union[str, List[str]]]], default None):
Set the
indicesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'indices'key. - tags (Optional[List[str]], default None):
A list of strings to be added under the
'tags'key ofparameters. You can select pipes with certain tags using--tags. - dtypes (Optional[Dict[str, str]], default None):
Set the
dtypesdictionary ofparameters. Ifparametersis also provided, this dictionary is added under the'dtypes'key. - mrsm_instance (Optional[Union[str, InstanceConnector]], default None):
Connector for the Meerschaum instance where the pipe resides.
Defaults to the preconfigured default instance (
'sql:main'). - instance (Optional[Union[str, InstanceConnector]], default None):
Alias for
mrsm_instance. Ifmrsm_instanceis supplied, this value is ignored. - upsert (Optional[bool], default None):
If
True, setupserttoTruein the parameters. - autoincrement (Optional[bool], default None):
If
True, setautoincrementin the parameters. - autotime (Optional[bool], default None):
If
True, setautotimein the parameters. - precision (Union[str, Dict[str, Union[str, int]], None], default None):
If provided, set
precisionin the parameters. This may be either a string (the precision unit) or a dictionary of in the form{'unit': <unit>, 'interval': <interval>}. Default is determined by thedatetimecolumn dtype (e.g.datetime64[us]ismicrosecondprecision). - static (Optional[bool], default None):
If
True, setstaticin the parameters. - enforce (Optional[bool], default None):
If
False, skip data type enforcement. Default behavior isTrue. - null_indices (Optional[bool], default None):
Set to
Falseif there will be no null values in the index columns. Defaults toTrue. - mixed_numerics (bool, default None):
If
True, integer columns will be converted tonumericwhen floats are synced. Set toFalseto disable this behavior. Defaults toTrue. - compress (Union[bool, Dict[str, Any], None], default None):
If
True(or a dictionary of compression settings), mark the pipe for compression. For TimescaleDB hypertables, a columnstore (compression) policy is installed automatically on sync. A dictionary may overridesegmentby,orderby, andafter. Defaults toFalse. - hypercore (bool, default True):
For TimescaleDB hypertables, enable the Hypercore columnstore at table creation
(declaring
segmentby/orderbyinCREATE TABLE), which causes TimescaleDB to auto-create a columnstore policy. Set toFalsefor a plain row-store hypertable. Has no effect unless the pipe is a hypertable (hypertable, defaultTrue). - temporary (bool, default False):
If
True, prevent instance tables (pipes, users, plugins) from being created. - cache (Optional[bool], default None):
If
True, cache the pipe's metadata to disk (in addition to in-memory caching). Ifcacheis not explicitlyTrue, it is set toFalseiftemporaryisTrue. Defaults toTrue(fromNone). - cache_connector_keys (Optional[str], default None):
If provided, use the keys to a Valkey connector (e.g.
valkey:main). - references (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): If provided, inherit the parameters of the reference Pipe(s). May be equal to a string of the Pipe constructor, a dictionary of constructor keys, a Pipe itself, or a list of any of these values.
- parents (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for parent pipes. See
referencesfor values. - children (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None):
Set references for child pipes. See
referencesfor values.
500 @property 501 def metric_key(self) -> str: 502 """ 503 Return the pipe's metric key. 504 """ 505 return self._metric_key
Return the pipe's metric key.
507 @property 508 def metric(self) -> str: 509 """ 510 Return the pipe's metric key. 511 """ 512 return self._metric_key
Return the pipe's metric key.
514 @property 515 def location_key(self) -> Union[str, None]: 516 """ 517 Return the pipe's location key. 518 """ 519 return self._location_key
Return the pipe's location key.
521 @property 522 def location(self) -> Union[str, None]: 523 """ 524 Return the pipe's location key. 525 """ 526 return self._location_key
Return the pipe's location key.
528 @property 529 def meta(self): 530 """ 531 Return the four keys needed to reconstruct this pipe. 532 """ 533 return { 534 'connector_keys': self.connector_keys, 535 'metric_key': self.metric_key, 536 'location_key': self.location_key, 537 'instance_keys': self.instance_keys, 538 }
Return the four keys needed to reconstruct this pipe.
540 def keys(self) -> List[str]: 541 """ 542 Return the ordered keys for this pipe. 543 """ 544 return { 545 key: val 546 for key, val in self.meta.items() 547 if key != 'instance' 548 }
Return the ordered keys for this pipe.
550 @property 551 def instance_keys(self) -> str: 552 """ 553 Return the pipe's instance keys. 554 """ 555 return self._instance_keys
Return the pipe's instance keys.
557 @property 558 def instance(self) -> Union[InstanceConnector, str]: 559 """ 560 Return the pipe's instance connector or keys. 561 """ 562 conn = self.instance_connector 563 if conn is None: 564 return self.instance_keys 565 return conn
Return the pipe's instance connector or keys.
567 @property 568 def instance_connector(self) -> Union[InstanceConnector, None]: 569 """ 570 The instance connector on which this pipe resides. 571 """ 572 if '_instance_connector' not in self.__dict__: 573 from meerschaum.connectors.parse import parse_instance_keys 574 conn = parse_instance_keys(self.instance_keys) 575 if conn: 576 self._instance_connector = conn 577 else: 578 return None 579 return self._instance_connector
The instance connector on which this pipe resides.
581 @property 582 def connector_keys(self) -> str: 583 """ 584 Return the pipe's connector keys. 585 """ 586 return self._connector_keys
Return the pipe's connector keys.
588 @property 589 def connector_key(self) -> str: 590 """ 591 Legacy: use `Pipe.connector_keys` instead. 592 """ 593 return self.connector_keys
Legacy: use Pipe.connector_keys instead.
595 @property 596 def connector(self) -> Union['Connector', str]: 597 """ 598 The connector to the data source. 599 """ 600 if '_connector' not in self.__dict__: 601 from meerschaum.connectors.parse import parse_instance_keys 602 import warnings 603 with warnings.catch_warnings(): 604 warnings.simplefilter('ignore') 605 try: 606 conn = parse_instance_keys(self.connector_keys) 607 except Exception: 608 conn = None 609 if conn: 610 self._connector = conn 611 else: 612 return self._connector_keys 613 return self._connector
The connector to the data source.
21def fetch( 22 self, 23 begin: Union[datetime, int, str, None] = '', 24 end: Union[datetime, int, None] = None, 25 check_existing: bool = True, 26 sync_chunks: bool = False, 27 debug: bool = False, 28 **kw: Any 29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 30 """ 31 Fetch a Pipe's latest data from its connector. 32 33 Parameters 34 ---------- 35 begin: Union[datetime, str, None], default '': 36 If provided, only fetch data newer than or equal to `begin`. 37 38 end: Optional[datetime], default None: 39 If provided, only fetch data older than or equal to `end`. 40 41 check_existing: bool, default True 42 If `False`, do not apply the backtrack interval. 43 44 sync_chunks: bool, default False 45 If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching 46 loads chunks into memory. 47 48 debug: bool, default False 49 Verbosity toggle. 50 51 Returns 52 ------- 53 A `pd.DataFrame` of the newest unseen data. 54 55 """ 56 if 'fetch' not in dir(self.connector): 57 warn(f"No `fetch()` function defined for connector '{self.connector}'") 58 return None 59 60 from meerschaum.connectors import get_connector_plugin 61 from meerschaum.utils.misc import filter_arguments 62 63 _chunk_hook = kw.pop('chunk_hook', None) 64 kw['workers'] = self.get_num_workers(kw.get('workers', None)) 65 if sync_chunks and _chunk_hook is None: 66 67 def _chunk_hook(chunk, **_kw) -> SuccessTuple: 68 """ 69 Wrap `Pipe.sync()` with a custom chunk label prepended to the message. 70 """ 71 from meerschaum.config._patch import apply_patch_to_config 72 kwargs = apply_patch_to_config(kw, _kw) 73 chunk_success, chunk_message = self.sync(chunk, **kwargs) 74 chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None)) 75 if chunk_label: 76 chunk_message = '\n' + chunk_label + '\n' + chunk_message 77 return chunk_success, chunk_message 78 79 begin, end = self.parse_date_bounds(begin, end) 80 81 with mrsm.Venv(get_connector_plugin(self.connector)): 82 _args, _kwargs = filter_arguments( 83 self.connector.fetch, 84 self, 85 begin=_determine_begin( 86 self, 87 begin, 88 end, 89 check_existing=check_existing, 90 debug=debug, 91 ), 92 end=end, 93 chunk_hook=_chunk_hook, 94 debug=debug, 95 **kw 96 ) 97 df = self.connector.fetch(*_args, **_kwargs) 98 return df
Fetch a Pipe's latest data from its connector.
Parameters
- begin (Union[datetime, str, None], default '':):
If provided, only fetch data newer than or equal to
begin. - end (Optional[datetime], default None:):
If provided, only fetch data older than or equal to
end. - check_existing (bool, default True):
If
False, do not apply the backtrack interval. - sync_chunks (bool, default False):
If
Trueand the pipe's connector is of type'sql', begin syncing chunks while fetching loads chunks into memory. - debug (bool, default False): Verbosity toggle.
Returns
- A
pd.DataFrameof the newest unseen data.
101def get_backtrack_interval( 102 self, 103 check_existing: bool = True, 104 debug: bool = False, 105) -> Union[timedelta, int]: 106 """ 107 Get the chunk interval to use for this pipe. 108 109 Parameters 110 ---------- 111 check_existing: bool, default True 112 If `False`, return a backtrack_interval of 0 minutes. 113 114 Returns 115 ------- 116 The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 117 """ 118 from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES 119 default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes') 120 configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None) 121 backtrack_minutes = ( 122 configured_backtrack_minutes 123 if configured_backtrack_minutes is not None 124 else default_backtrack_minutes 125 ) if check_existing else 0 126 127 dt_col = self.columns.get('datetime', None) 128 if dt_col is None: 129 return timedelta(minutes=backtrack_minutes) 130 131 dt_dtype = self.dtypes.get(dt_col, 'datetime') 132 if 'int' in dt_dtype.lower(): 133 if not self.parameters.get('precision', None): 134 return backtrack_minutes 135 precision_unit = self.precision.get('unit', None) 136 true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 137 scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None) 138 if scalar is not None: 139 return int(backtrack_minutes * 60 * scalar) 140 return backtrack_minutes 141 142 return timedelta(minutes=backtrack_minutes)
Get the chunk interval to use for this pipe.
Parameters
- check_existing (bool, default True):
If
False, return a backtrack_interval of 0 minutes.
Returns
- The backtrack interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
23def get_data( 24 self, 25 select_columns: Optional[List[str]] = None, 26 omit_columns: Optional[List[str]] = None, 27 begin: Union[datetime, int, str, None] = None, 28 end: Union[datetime, int, str, None] = None, 29 params: Optional[Dict[str, Any]] = None, 30 as_docs: bool = False, 31 as_iterator: bool = False, 32 as_chunks: bool = False, 33 as_dask: bool = False, 34 add_missing_columns: bool = False, 35 chunk_interval: Union[timedelta, int, None] = None, 36 order: Optional[str] = 'asc', 37 limit: Optional[int] = None, 38 fresh: bool = False, 39 debug: bool = False, 40 **kw: Any 41) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]: 42 """ 43 Get a pipe's data from the instance connector. 44 45 Parameters 46 ---------- 47 select_columns: Optional[List[str]], default None 48 If provided, only select these given columns. 49 Otherwise select all available columns (i.e. `SELECT *`). 50 51 omit_columns: Optional[List[str]], default None 52 If provided, remove these columns from the selection. 53 54 begin: Union[datetime, int, str, None], default None 55 Lower bound datetime to begin searching for data (inclusive). 56 Translates to a `WHERE` clause like `WHERE datetime >= begin`. 57 Defaults to `None`. 58 59 end: Union[datetime, int, str, None], default None 60 Upper bound datetime to stop searching for data (inclusive). 61 Translates to a `WHERE` clause like `WHERE datetime < end`. 62 Defaults to `None`. 63 64 params: Optional[Dict[str, Any]], default None 65 Filter the retrieved data by a dictionary of parameters. 66 See `meerschaum.utils.sql.build_where` for more details. 67 68 as_docs: bool, default False 69 If `True`, return a list of dictionaries rather than a DataFrame. 70 Relies on `get_pipe_docs` from the instance connector if implemented. 71 May be combined with `as_chunks` to return an `Iterator[List[Dict]]` 72 chunked by time bounds (useful for large result sets without pandas overhead). 73 74 as_iterator: bool, default False 75 If `True`, return a generator of chunks of pipe data. 76 When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames. 77 78 as_chunks: bool, default False 79 Alias for `as_iterator`. 80 When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames. 81 82 as_dask: bool, default False 83 If `True`, return a `dask.DataFrame` 84 (which may be loaded into a Pandas DataFrame with `df.compute()`). 85 86 add_missing_columns: bool, default False 87 If `True`, add any missing columns from `Pipe.dtypes` to the dataframe. 88 89 chunk_interval: Union[timedelta, int, None], default None 90 If `as_iterator`, then return chunks with `begin` and `end` separated by this interval. 91 This may be set under `pipe.parameters['chunk_minutes']`. 92 By default, use a timedelta of 43200 minutes (30 days). 93 If `chunk_interval` is an integer and the `datetime` axis a timestamp, 94 the use a timedelta with the number of minutes configured to this value. 95 If the `datetime` axis is an integer, default to the configured chunksize. 96 If `chunk_interval` is a `timedelta` and the `datetime` axis an integer, 97 use the number of minutes in the `timedelta`. 98 99 order: Optional[str], default 'asc' 100 If `order` is not `None`, sort the resulting dataframe by indices. 101 102 limit: Optional[int], default None 103 If provided, cap the dataframe to this many rows. 104 105 fresh: bool, default False 106 If `True`, skip local cache and directly query the instance connector. 107 108 debug: bool, default False 109 Verbosity toggle. 110 Defaults to `False`. 111 112 Returns 113 ------- 114 A `pd.DataFrame` of the pipe's data (default). 115 A `List[Dict]` if `as_docs=True`. 116 An `Iterator[pd.DataFrame]` if `as_chunks=True` (or `as_iterator=True`). 117 An `Iterator[List[Dict]]` if both `as_docs=True` and `as_chunks=True`. 118 119 """ 120 from meerschaum.utils.warnings import warn 121 from meerschaum.utils.venv import Venv 122 from meerschaum.connectors import get_connector_plugin 123 from meerschaum.utils.dtypes import to_pandas_dtype 124 from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator 125 from meerschaum.utils.packages import attempt_import 126 from meerschaum.utils.warnings import dprint 127 dd = attempt_import('dask.dataframe') if as_dask else None 128 dask = attempt_import('dask') if as_dask else None 129 _ = attempt_import('partd', lazy=False) if as_dask else None 130 131 if select_columns == '*': 132 select_columns = None 133 elif isinstance(select_columns, str): 134 select_columns = [select_columns] 135 136 if isinstance(omit_columns, str): 137 omit_columns = [omit_columns] 138 139 begin, end = self.parse_date_bounds(begin, end, debug=debug) 140 as_iterator = as_iterator or as_chunks 141 dt_col = self.columns.get('datetime', None) 142 143 def _sort_df(_df): 144 if df_is_chunk_generator(_df): 145 return _df 146 indices = [] if dt_col not in _df.columns else [dt_col] 147 non_dt_cols = [ 148 col 149 for col_ix, col in self.columns.items() 150 if col_ix != 'datetime' and col in _df.columns 151 ] 152 indices.extend(non_dt_cols) 153 if 'dask' not in _df.__module__: 154 _df.sort_values( 155 by=indices, 156 inplace=True, 157 ascending=(str(order).lower() == 'asc'), 158 ) 159 _df.reset_index(drop=True, inplace=True) 160 else: 161 _df = _df.sort_values( 162 by=indices, 163 ascending=(str(order).lower() == 'asc'), 164 ) 165 _df = _df.reset_index(drop=True) 166 if limit is not None and len(_df) > limit: 167 return _df.head(limit) 168 return _df 169 170 if as_iterator or as_chunks: 171 df = self._get_data_as_iterator( 172 select_columns=select_columns, 173 omit_columns=omit_columns, 174 begin=begin, 175 end=end, 176 params=params, 177 chunk_interval=chunk_interval, 178 limit=limit, 179 order=order, 180 as_docs=as_docs, 181 fresh=fresh, 182 debug=debug, 183 ) 184 if as_docs: 185 return df 186 return _sort_df(df) 187 188 if as_dask: 189 from multiprocessing.pool import ThreadPool 190 dask_pool = ThreadPool(self.get_num_workers()) 191 dask.config.set(pool=dask_pool) 192 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 193 bounds = self.get_chunk_bounds( 194 begin=begin, 195 end=end, 196 bounded=False, 197 chunk_interval=chunk_interval, 198 debug=debug, 199 ) 200 dask_chunks = [ 201 dask.delayed(self.get_data)( 202 select_columns=select_columns, 203 omit_columns=omit_columns, 204 begin=chunk_begin, 205 end=chunk_end, 206 params=params, 207 chunk_interval=chunk_interval, 208 order=order, 209 limit=limit, 210 fresh=fresh, 211 add_missing_columns=True, 212 debug=debug, 213 ) 214 for (chunk_begin, chunk_end) in bounds 215 ] 216 dask_meta = { 217 col: to_pandas_dtype(typ) 218 for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items() 219 } 220 if debug: 221 dprint(f"Dask meta:\n{dask_meta}") 222 return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta)) 223 224 if not self.exists(debug=debug): 225 return [] if as_docs else None 226 227 if as_docs: 228 with Venv(get_connector_plugin(self.instance_connector)): 229 docs = self.instance_connector.get_pipe_docs( 230 pipe=self, 231 select_columns=select_columns, 232 omit_columns=omit_columns, 233 begin=begin, 234 end=end, 235 params=params, 236 limit=limit, 237 order=order, 238 debug=debug, 239 **kw 240 ) 241 return docs if docs is not None else [] 242 243 with Venv(get_connector_plugin(self.instance_connector)): 244 df = self.instance_connector.get_pipe_data( 245 pipe=self, 246 select_columns=select_columns, 247 omit_columns=omit_columns, 248 begin=begin, 249 end=end, 250 params=params, 251 limit=limit, 252 order=order, 253 debug=debug, 254 **kw 255 ) 256 if df is None: 257 return df 258 259 if not select_columns: 260 select_columns = [col for col in df.columns] 261 262 pipe_dtypes = self.get_dtypes(refresh=False, debug=debug) 263 cols_to_omit = [ 264 col 265 for col in df.columns 266 if ( 267 col in (omit_columns or []) 268 or 269 col not in (select_columns or []) 270 ) 271 ] 272 cols_to_add = [ 273 col 274 for col in select_columns 275 if col not in df.columns 276 ] + ([ 277 col 278 for col in pipe_dtypes 279 if col not in df.columns 280 ] if add_missing_columns else []) 281 if cols_to_omit: 282 warn( 283 ( 284 f"Received {len(cols_to_omit)} omitted column" 285 + ('s' if len(cols_to_omit) != 1 else '') 286 + f" for {self}. " 287 + "Consider adding `select_columns` and `omit_columns` support to " 288 + f"'{self.instance_connector.type}' connectors to improve performance." 289 ), 290 stack=False, 291 ) 292 _cols_to_select = [col for col in df.columns if col not in cols_to_omit] 293 df = df[_cols_to_select] 294 295 if cols_to_add: 296 if not add_missing_columns: 297 from meerschaum.utils.misc import items_str 298 warn( 299 f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.", 300 stack=False, 301 ) 302 303 df = add_missing_cols_to_df( 304 df, 305 { 306 col: pipe_dtypes.get(col, 'string') 307 for col in cols_to_add 308 }, 309 ) 310 311 enforced_df = self.enforce_dtypes( 312 df, 313 dtypes=pipe_dtypes, 314 debug=debug, 315 ) 316 317 if order: 318 return _sort_df(enforced_df) 319 return enforced_df
Get a pipe's data from the instance connector.
Parameters
- select_columns (Optional[List[str]], default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
- begin (Union[datetime, int, str, None], default None):
Lower bound datetime to begin searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime >= begin. Defaults toNone. - end (Union[datetime, int, str, None], default None):
Upper bound datetime to stop searching for data (inclusive).
Translates to a
WHEREclause likeWHERE datetime < end. Defaults toNone. - params (Optional[Dict[str, Any]], default None):
Filter the retrieved data by a dictionary of parameters.
See
meerschaum.utils.sql.build_wherefor more details. - as_docs (bool, default False):
If
True, return a list of dictionaries rather than a DataFrame. Relies onget_pipe_docsfrom the instance connector if implemented. May be combined withas_chunksto return anIterator[List[Dict]]chunked by time bounds (useful for large result sets without pandas overhead). - as_iterator (bool, default False):
If
True, return a generator of chunks of pipe data. When combined withas_docs=True, yieldsList[Dict]per chunk instead of DataFrames. - as_chunks (bool, default False):
Alias for
as_iterator. When combined withas_docs=True, yieldsList[Dict]per chunk instead of DataFrames. - as_dask (bool, default False):
If
True, return adask.DataFrame(which may be loaded into a Pandas DataFrame withdf.compute()). - add_missing_columns (bool, default False):
If
True, add any missing columns fromPipe.dtypesto the dataframe. - chunk_interval (Union[timedelta, int, None], default None):
If
as_iterator, then return chunks withbeginandendseparated by this interval. This may be set underpipe.parameters['chunk_minutes']. By default, use a timedelta of 43200 minutes (30 days). Ifchunk_intervalis an integer and thedatetimeaxis a timestamp, the use a timedelta with the number of minutes configured to this value. If thedatetimeaxis is an integer, default to the configured chunksize. Ifchunk_intervalis atimedeltaand thedatetimeaxis an integer, use the number of minutes in thetimedelta. - order (Optional[str], default 'asc'):
If
orderis notNone, sort the resulting dataframe by indices. - limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
- fresh (bool, default False):
If
True, skip local cache and directly query the instance connector. - debug (bool, default False):
Verbosity toggle.
Defaults to
False.
Returns
- A
pd.DataFrameof the pipe's data (default). - A
List[Dict]ifas_docs=True. - An
Iterator[pd.DataFrame]ifas_chunks=True(oras_iterator=True). - An
Iterator[List[Dict]]if bothas_docs=Trueandas_chunks=True.
414def get_backtrack_data( 415 self, 416 backtrack_minutes: Optional[int] = None, 417 begin: Union[datetime, int, None] = None, 418 params: Optional[Dict[str, Any]] = None, 419 limit: Optional[int] = None, 420 fresh: bool = False, 421 debug: bool = False, 422 **kw: Any 423) -> Optional['pd.DataFrame']: 424 """ 425 Get the most recent data from the instance connector as a Pandas DataFrame. 426 427 Parameters 428 ---------- 429 backtrack_minutes: Optional[int], default None 430 How many minutes from `begin` to select from. 431 If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`. 432 433 begin: Optional[datetime], default None 434 The starting point to search for data. 435 If begin is `None` (default), use the most recent observed datetime 436 (AKA sync_time). 437 438 ``` 439 E.g. begin = 02:00 440 441 Search this region. Ignore this, even if there's data. 442 / / / / / / / / / | 443 -----|----------|----------|----------|----------|----------| 444 00:00 01:00 02:00 03:00 04:00 05:00 445 446 ``` 447 448 params: Optional[Dict[str, Any]], default None 449 The standard Meerschaum `params` query dictionary. 450 451 limit: Optional[int], default None 452 If provided, cap the number of rows to be returned. 453 454 fresh: bool, default False 455 If `True`, Ignore local cache and pull directly from the instance connector. 456 Only comes into effect if a pipe was created with `cache=True`. 457 458 debug: bool default False 459 Verbosity toggle. 460 461 Returns 462 ------- 463 A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data 464 is a convenient way to get a pipe's data "backtracked" from the most recent datetime. 465 """ 466 from meerschaum.utils.venv import Venv 467 from meerschaum.connectors import get_connector_plugin 468 469 if not self.exists(debug=debug): 470 return None 471 472 begin = self.parse_date_bounds(begin, debug=debug) 473 474 backtrack_interval = self.get_backtrack_interval(debug=debug) 475 if backtrack_minutes is None: 476 backtrack_minutes = ( 477 (backtrack_interval.total_seconds() / 60) 478 if isinstance(backtrack_interval, timedelta) 479 else backtrack_interval 480 ) 481 482 if hasattr(self.instance_connector, 'get_backtrack_data'): 483 with Venv(get_connector_plugin(self.instance_connector)): 484 return self.enforce_dtypes( 485 self.instance_connector.get_backtrack_data( 486 pipe=self, 487 begin=begin, 488 backtrack_minutes=backtrack_minutes, 489 params=params, 490 limit=limit, 491 debug=debug, 492 **kw 493 ), 494 debug=debug, 495 ) 496 497 if begin is None: 498 begin = self.get_sync_time(params=params, debug=debug) 499 500 backtrack_interval = ( 501 timedelta(minutes=backtrack_minutes) 502 if isinstance(begin, datetime) 503 else backtrack_minutes 504 ) 505 if begin is not None: 506 begin = begin - backtrack_interval 507 508 kw['order'] = kw.get('order', 'desc') or 'desc' 509 return self.get_data( 510 begin=begin, 511 params=params, 512 debug=debug, 513 limit=limit, 514 **kw 515 )
Get the most recent data from the instance connector as a Pandas DataFrame.
Parameters
- backtrack_minutes (Optional[int], default None):
How many minutes from
beginto select from. IfNone, usepipe.parameters['fetch']['backtrack_minutes']. begin (Optional[datetime], default None): The starting point to search for data. If begin is
None(default), use the most recent observed datetime (AKA sync_time).E.g. begin = 02:00 Search this region. Ignore this, even if there's data. / / / / / / / / / | -----|----------|----------|----------|----------|----------| 00:00 01:00 02:00 03:00 04:00 05:00params (Optional[Dict[str, Any]], default None): The standard Meerschaum
paramsquery dictionary.- limit (Optional[int], default None): If provided, cap the number of rows to be returned.
- fresh (bool, default False):
If
True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created withcache=True. - debug (bool default False): Verbosity toggle.
Returns
- A
pd.DataFramefor the pipe's data corresponding to the provided parameters. Backtrack data - is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
518def get_rowcount( 519 self, 520 begin: Union[datetime, int, None] = None, 521 end: Union[datetime, int, None] = None, 522 params: Optional[Dict[str, Any]] = None, 523 remote: bool = False, 524 debug: bool = False 525) -> int: 526 """ 527 Get a Pipe's instance or remote rowcount. 528 529 Parameters 530 ---------- 531 begin: Optional[datetime], default None 532 Count rows where datetime > begin. 533 534 end: Optional[datetime], default None 535 Count rows where datetime < end. 536 537 remote: bool, default False 538 Count rows from a pipe's remote source. 539 **NOTE**: This is experimental! 540 541 debug: bool, default False 542 Verbosity toggle. 543 544 Returns 545 ------- 546 An `int` of the number of rows in the pipe corresponding to the provided parameters. 547 Returned 0 if the pipe does not exist. 548 """ 549 from meerschaum.utils.warnings import warn 550 from meerschaum.utils.venv import Venv 551 from meerschaum.connectors import get_connector_plugin 552 from meerschaum.utils.misc import filter_keywords 553 554 begin, end = self.parse_date_bounds(begin, end, debug=debug) 555 connector = self.instance_connector if not remote else self.connector 556 try: 557 with Venv(get_connector_plugin(connector)): 558 if not hasattr(connector, 'get_pipe_rowcount'): 559 warn( 560 f"Connectors of type '{connector.type}' " 561 "do not implement `get_pipe_rowcount()`.", 562 stack=False, 563 ) 564 return 0 565 kwargs = filter_keywords( 566 connector.get_pipe_rowcount, 567 begin=begin, 568 end=end, 569 params=params, 570 remote=remote, 571 debug=debug, 572 ) 573 if remote and 'remote' not in kwargs: 574 warn( 575 f"Connectors of type '{connector.type}' do not support remote rowcounts.", 576 stack=False, 577 ) 578 return 0 579 rowcount = connector.get_pipe_rowcount( 580 self, 581 begin=begin, 582 end=end, 583 params=params, 584 remote=remote, 585 debug=debug, 586 ) 587 if rowcount is None: 588 return 0 589 return rowcount 590 except AttributeError as e: 591 warn(e) 592 if remote: 593 return 0 594 warn(f"Failed to get a rowcount for {self}.") 595 return 0
Get a Pipe's instance or remote rowcount.
Parameters
- begin (Optional[datetime], default None): Count rows where datetime > begin.
- end (Optional[datetime], default None): Count rows where datetime < end.
- remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
- debug (bool, default False): Verbosity toggle.
Returns
- An
intof the number of rows in the pipe corresponding to the provided parameters. - Returned 0 if the pipe does not exist.
598def get_size( 599 self, 600 debug: bool = False, 601 **kw: Any 602) -> Union[int, None]: 603 """ 604 Return the on-disk size of the pipe's target table in bytes. 605 606 Parameters 607 ---------- 608 debug: bool, default False 609 Verbosity toggle. 610 611 Returns 612 ------- 613 An `int` of the number of bytes occupied by the pipe's target table, 614 or `None` if the size could not be determined (e.g. the connector does 615 not implement `get_pipe_size()` or the table does not exist). 616 """ 617 from meerschaum.utils.warnings import warn 618 from meerschaum.utils.venv import Venv 619 from meerschaum.connectors import get_connector_plugin 620 from meerschaum.utils.misc import filter_keywords 621 622 connector = self.instance_connector 623 try: 624 with Venv(get_connector_plugin(connector)): 625 if not hasattr(connector, 'get_pipe_size'): 626 return None 627 kwargs = filter_keywords( 628 connector.get_pipe_size, 629 debug=debug, 630 **kw 631 ) 632 return connector.get_pipe_size(self, **kwargs) 633 except NotImplementedError: 634 return None 635 except Exception as e: 636 warn(f"Failed to get the size of {self}:\n{e}", stack=False) 637 return None
Return the on-disk size of the pipe's target table in bytes.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- An
intof the number of bytes occupied by the pipe's target table, - or
Noneif the size could not be determined (e.g. the connector does - not implement
get_pipe_size()or the table does not exist).
1004def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]: 1005 """ 1006 Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data(). 1007 Keywords arguments are passed to `Pipe.get_data()`. 1008 """ 1009 from meerschaum.utils.warnings import warn 1010 kwargs['limit'] = 1 1011 kwargs['as_docs'] = True 1012 try: 1013 docs = self.get_data(**kwargs) 1014 if not docs: 1015 return None 1016 return docs[0] 1017 except Exception as e: 1018 warn(f"Failed to read value from {self}:\n{e}", stack=False) 1019 return None
Convenience function to return a single row as a dictionary (or None) from Pipe.get_data().
Keywords arguments are passed toPipe.get_data()`.
1021def get_docs(self, **kwargs) -> list[dict[str, Any]]: 1022 """ 1023 Convenience method to return a pipe's data as a list of dictionaries. 1024 Relies on `get_pipe_docs` from the instance connector if implemented. 1025 """ 1026 kwargs['as_docs'] = True 1027 return self.get_data(**kwargs)
Convenience method to return a pipe's data as a list of dictionaries.
Relies on get_pipe_docs from the instance connector if implemented.
1029def get_value( 1030 self, 1031 column: str, 1032 params: Optional[Dict[str, Any]] = None, 1033 **kwargs: Any 1034) -> Any: 1035 """ 1036 Convenience function to return a single value (or `None`) from `Pipe.get_data()`. 1037 Keywords arguments are passed to `Pipe.get_data()`. 1038 """ 1039 from meerschaum.utils.warnings import warn 1040 kwargs['select_columns'] = [column] 1041 kwargs['limit'] = 1 1042 kwargs['as_docs'] = True 1043 try: 1044 docs = self.get_data(params=params, **kwargs) 1045 if not docs: 1046 return None 1047 if column not in docs[0]: 1048 raise ValueError(f"Column '{column}' was not included in the result set.") 1049 return docs[0][column] 1050 except Exception as e: 1051 warn(f"Failed to read value from {self}:\n{e}", stack=False) 1052 return None
Convenience function to return a single value (or None) from Pipe.get_data().
Keywords arguments are passed to Pipe.get_data().
640def get_chunk_interval( 641 self, 642 chunk_interval: Union[timedelta, int, None] = None, 643 debug: bool = False, 644) -> Union[timedelta, int]: 645 """ 646 Get the chunk interval to use for this pipe. 647 648 The size is read from the `verify` parameters. Any one of these aliased keys may be used 649 (the first present, in this priority order, wins): 650 651 - `verify.chunk_minutes` (the default; 43200 — 30 days — if none is set) 652 - `verify.chunk_hours` 653 - `verify.chunk_days` 654 - `verify.chunk_weeks` 655 - `verify.chunk_years` 656 - `verify.chunk_seconds` 657 658 For an integer datetime axis, `verify.chunk_range` (if set) is used verbatim as the chunk size 659 in epoch units. Otherwise the time-based size above is converted to epoch units via the pipe's 660 `precision`, or — preserving legacy behavior when no `precision` is set — its minutes are used 661 verbatim. 662 663 Parameters 664 ---------- 665 chunk_interval: Union[timedelta, int, None], default None 666 If provided, coerce this value into the correct type (overriding the `verify` keys). 667 For example, if the datetime axis is an integer, then return the number of minutes. 668 669 Returns 670 ------- 671 The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis. 672 """ 673 from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES 674 675 dt_col = self.columns.get('datetime', None) 676 dt_dtype = self.dtypes.get(dt_col, 'datetime') if dt_col is not None else 'datetime' 677 is_int_axis = 'int' in str(dt_dtype).lower() 678 verify_params = self.parameters.get('verify', {}) 679 680 ### An explicit `chunk_interval` argument overrides everything (legacy behavior). 681 if chunk_interval is not None: 682 chunk_minutes = ( 683 chunk_interval 684 if isinstance(chunk_interval, int) 685 else int(chunk_interval.total_seconds() / 60) 686 ) 687 if dt_col is None: 688 return timedelta(minutes=chunk_minutes) 689 return chunk_minutes if is_int_axis else timedelta(minutes=chunk_minutes) 690 691 ### Integer axis: an explicit `verify.chunk_range` is the chunk size in epoch units, verbatim. 692 if dt_col is not None and is_int_axis: 693 chunk_range = verify_params.get('chunk_range', None) 694 if chunk_range is not None: 695 return int(chunk_range) 696 697 ### Resolve the time-based chunk size from the aliased `verify.chunk_*` keys (priority order 698 ### matches the `bound_*` aliases). Falls back to the configured `chunk_minutes` default. 699 chunk_delta = None 700 for suffix in ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds'): 701 val = verify_params.get('chunk_' + suffix, None) 702 if val is None: 703 continue 704 ### `timedelta` has no `years` kwarg; approximate a year as 365 days. 705 chunk_delta = timedelta(days=(val * 365)) if suffix == 'years' else timedelta(**{suffix: val}) 706 break 707 if chunk_delta is None: 708 default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes') 709 chunk_delta = timedelta(minutes=default_chunk_minutes) 710 711 if dt_col is None: 712 return chunk_delta 713 714 if is_int_axis: 715 ### Legacy: without `precision` (and without `chunk_range`), use the chunk's minutes 716 ### verbatim as the integer interval. 717 if not self.parameters.get('precision', None): 718 return int(chunk_delta.total_seconds() / 60) 719 precision_unit = self.precision.get('unit', None) 720 true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 721 scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None) 722 if scalar is not None: 723 return int(chunk_delta.total_seconds() * scalar) 724 return int(chunk_delta.total_seconds() / 60) 725 726 return chunk_delta
Get the chunk interval to use for this pipe.
The size is read from the verify parameters. Any one of these aliased keys may be used
(the first present, in this priority order, wins):
- `verify.chunk_minutes` (the default; 43200 — 30 days — if none is set)
- `verify.chunk_hours`
- `verify.chunk_days`
- `verify.chunk_weeks`
- `verify.chunk_years`
- `verify.chunk_seconds`
For an integer datetime axis, verify.chunk_range (if set) is used verbatim as the chunk size
in epoch units. Otherwise the time-based size above is converted to epoch units via the pipe's
precision, or — preserving legacy behavior when no precision is set — its minutes are used
verbatim.
Parameters
- chunk_interval (Union[timedelta, int, None], default None):
If provided, coerce this value into the correct type (overriding the
verifykeys). For example, if the datetime axis is an integer, then return the number of minutes.
Returns
- The chunk interval (
timedeltaorint) to use with this pipe'sdatetimeaxis.
729def get_chunk_bounds( 730 self, 731 begin: Union[datetime, int, None] = None, 732 end: Union[datetime, int, None] = None, 733 bounded: bool = False, 734 chunk_interval: Union[timedelta, int, None] = None, 735 align: bool = False, 736 debug: bool = False, 737) -> List[ 738 Tuple[ 739 Union[datetime, int, None], 740 Union[datetime, int, None], 741 ] 742]: 743 """ 744 Return a list of datetime bounds for iterating over the pipe's `datetime` axis. 745 746 Parameters 747 ---------- 748 begin: Union[datetime, int, None], default None 749 If provided, do not select less than this value. 750 Otherwise the first chunk will be unbounded. 751 752 end: Union[datetime, int, None], default None 753 If provided, do not select greater than or equal to this value. 754 Otherwise the last chunk will be unbounded. 755 756 bounded: bool, default False 757 If `True`, do not include `None` in the first chunk. 758 759 chunk_interval: Union[timedelta, int, None], default None 760 If provided, use this interval for the size of chunk boundaries. 761 The default value for this pipe may be set 762 under `pipe.parameters['verify']['chunk_minutes']`. 763 764 align: bool, default False 765 If `True`, anchor the interior chunk boundaries to a fixed Unix-epoch grid (the same 766 grid used for native range partitioning) rather than to `begin`. This makes the 767 boundaries deterministic across re-syncs and aligned with the pipe's partitions 768 (used by `Pipe.verify()`). The first chunk's lower bound and the last chunk's upper 769 bound are still clamped to `begin` / `end`. 770 771 debug: bool, default False 772 Verbosity toggle. 773 774 Returns 775 ------- 776 A list of chunk bounds (datetimes or integers). 777 If unbounded, the first and last chunks will include `None`. 778 """ 779 from datetime import timedelta 780 from meerschaum.utils.dtypes import are_dtypes_equal 781 from meerschaum.utils.misc import interval_str 782 include_less_than_begin = not bounded and begin is None 783 include_greater_than_end = not bounded and end is None 784 if begin is None: 785 begin = self.get_sync_time(newest=False, debug=debug) 786 consolidate_end_chunk = False 787 if end is None: 788 end = self.get_sync_time(newest=True, debug=debug) 789 if end is not None and hasattr(end, 'tzinfo'): 790 end += timedelta(minutes=1) 791 consolidate_end_chunk = True 792 elif are_dtypes_equal(str(type(end)), 'int'): 793 end += 1 794 consolidate_end_chunk = True 795 796 if begin is None and end is None: 797 return [(None, None)] 798 799 begin, end = self.parse_date_bounds(begin, end, debug=debug) 800 801 if begin and end: 802 if begin >= end: 803 return ( 804 [(begin, begin)] 805 if bounded 806 else [(begin, None)] 807 ) 808 if end <= begin: 809 return ( 810 [(end, end)] 811 if bounded 812 else [(None, begin)] 813 ) 814 815 ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`. 816 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 817 818 ### Anchor the interior boundaries to a fixed Unix-epoch grid (matching native range 819 ### partitioning, see `SQLConnector._partition_bounds`) so chunk edges line up with partition 820 ### edges and stay deterministic regardless of `begin`. The first chunk is clamped back to 821 ### `begin` below. 822 begin_cursor = begin 823 if align and begin is not None: 824 if isinstance(chunk_interval, int): 825 begin_cursor = (int(begin) // chunk_interval) * chunk_interval 826 else: 827 epoch = ( 828 datetime(1970, 1, 1, tzinfo=begin.tzinfo) 829 if getattr(begin, 'tzinfo', None) is not None 830 else datetime(1970, 1, 1) 831 ) 832 n = (begin - epoch) // chunk_interval 833 begin_cursor = epoch + (n * chunk_interval) 834 835 ### Build a list of tuples containing the chunk boundaries 836 ### so that we can sync multiple chunks in parallel. 837 ### Run `verify pipes --workers 1` to sync chunks in series. 838 chunk_bounds = [] 839 num_chunks = 0 840 max_chunks = 1_000_000 841 while begin_cursor < end: 842 end_cursor = begin_cursor + chunk_interval 843 chunk_bounds.append((begin_cursor, end_cursor)) 844 begin_cursor = end_cursor 845 num_chunks += 1 846 if num_chunks >= max_chunks: 847 raise ValueError( 848 f"Too many chunks of size '{interval_str(chunk_interval)}' " 849 f"between '{begin}' and '{end}'." 850 ) 851 852 if num_chunks > 1 and consolidate_end_chunk: 853 last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2] 854 chunk_bounds = chunk_bounds[:-2] 855 chunk_bounds.append((second_last_bounds[0], last_bounds[1])) 856 857 ### The chunk interval might be too large. 858 if not chunk_bounds and end >= begin: 859 chunk_bounds = [(begin, end)] 860 861 ### Truncate the last chunk to the end timestamp. 862 if chunk_bounds[-1][1] > end: 863 chunk_bounds[-1] = (chunk_bounds[-1][0], end) 864 865 ### Pop the last chunk if its bounds are equal. 866 if chunk_bounds[-1][0] == chunk_bounds[-1][1]: 867 chunk_bounds = chunk_bounds[:-1] 868 869 ### Clamp the epoch-aligned first chunk's lower bound back to the requested `begin` so the 870 ### returned range still starts exactly at `begin` (only the interior edges are grid-aligned). 871 if ( 872 align 873 and chunk_bounds 874 and chunk_bounds[0][0] is not None 875 and chunk_bounds[0][0] < begin 876 ): 877 chunk_bounds[0] = (begin, chunk_bounds[0][1]) 878 879 if include_less_than_begin: 880 chunk_bounds = [(None, begin)] + chunk_bounds 881 if include_greater_than_end: 882 chunk_bounds = chunk_bounds + [(end, None)] 883 884 return chunk_bounds
Return a list of datetime bounds for iterating over the pipe's datetime axis.
Parameters
- begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
- end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
- bounded (bool, default False):
If
True, do not includeNonein the first chunk. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this interval for the size of chunk boundaries.
The default value for this pipe may be set
under
pipe.parameters['verify']['chunk_minutes']. - align (bool, default False):
If
True, anchor the interior chunk boundaries to a fixed Unix-epoch grid (the same grid used for native range partitioning) rather than tobegin. This makes the boundaries deterministic across re-syncs and aligned with the pipe's partitions (used byPipe.verify()). The first chunk's lower bound and the last chunk's upper bound are still clamped tobegin/end. - debug (bool, default False): Verbosity toggle.
Returns
- A list of chunk bounds (datetimes or integers).
- If unbounded, the first and last chunks will include
None.
887def get_chunk_bounds_batches( 888 self, 889 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]], 890 batchsize: Optional[int] = None, 891 workers: Optional[int] = None, 892 debug: bool = False, 893) -> List[ 894 Tuple[ 895 Tuple[ 896 Union[datetime, int, None], 897 Union[datetime, int, None], 898 ], ... 899 ] 900]: 901 """ 902 Return a list of tuples of chunk bounds of size `batchsize`. 903 904 Parameters 905 ---------- 906 chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]] 907 A list of chunk_bounds (see `Pipe.get_chunk_bounds()`). 908 909 batchsize: Optional[int], default None 910 How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`. 911 912 workers: Optional[int], default None 913 If `batchsize` is `None`, use this as the desired number of workers. 914 Passed to `Pipe.get_num_workers()`. 915 916 Returns 917 ------- 918 A list of tuples of chunk bound tuples. 919 """ 920 from meerschaum.utils.misc import iterate_chunks 921 922 if batchsize is None: 923 batchsize = self.get_num_workers(workers=workers) 924 925 return [ 926 tuple( 927 _batch_chunk_bounds 928 for _batch_chunk_bounds in batch 929 if _batch_chunk_bounds is not None 930 ) 931 for batch in iterate_chunks(chunk_bounds, batchsize) 932 if batch 933 ]
Return a list of tuples of chunk bounds of size batchsize.
Parameters
- chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]):
A list of chunk_bounds (see
Pipe.get_chunk_bounds()). - batchsize (Optional[int], default None):
How many chunks to include in a batch. Defaults to
Pipe.get_num_workers(). - workers (Optional[int], default None):
If
batchsizeisNone, use this as the desired number of workers. Passed toPipe.get_num_workers().
Returns
- A list of tuples of chunk bound tuples.
936def parse_date_bounds(self, *dt_vals: Union[datetime, int, None], debug: bool = False) -> Union[ 937 datetime, 938 int, 939 str, 940 None, 941 Tuple[Union[datetime, int, str, None]] 942]: 943 """ 944 Given a date bound (begin, end), coerce a timezone if necessary. 945 """ 946 from meerschaum.utils.misc import is_int 947 from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES, are_dtypes_equal 948 from meerschaum.utils.warnings import warn 949 dateutil_parser = mrsm.attempt_import('dateutil.parser') 950 951 _columns = None 952 _dtypes = None 953 954 def _get_coercion_info(): 955 nonlocal _columns, _dtypes 956 if _columns is None: 957 _columns = self.get_parameters(debug=debug).get('columns', {}) or {} 958 if _dtypes is None: 959 _dtypes = self.get_dtypes(debug=debug) 960 961 def _parse_date_bound(dt_val): 962 if dt_val is None: 963 return None 964 965 if isinstance(dt_val, int): 966 return dt_val 967 968 if dt_val == '': 969 return '' 970 971 if is_int(dt_val): 972 return int(dt_val) 973 974 if isinstance(dt_val, str): 975 try: 976 dt_val = dateutil_parser.parse(dt_val) 977 except Exception as e: 978 warn(f"Could not parse '{dt_val}' as datetime:\n{e}") 979 return None 980 981 _get_coercion_info() 982 dt_col = _columns.get('datetime', None) 983 dt_typ = str(_dtypes.get(dt_col, 'datetime')) 984 if are_dtypes_equal(dt_typ, 'int'): 985 if self.get_parameters(debug=debug).get('precision'): 986 from meerschaum.utils.dtypes import datetime_to_int 987 return datetime_to_int(dt_val, self.precision['unit']) 988 from meerschaum.utils.warnings import error 989 error( 990 f"Cannot use datetime bound '{dt_val}' on the non-epoch integer axis " 991 f"of {self}.\n Pass an integer instead, or set the `precision` parameter.", 992 ValueError, 993 ) 994 if dt_typ == 'datetime': 995 dt_typ = MRSM_PD_DTYPES['datetime'] 996 return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower())) 997 998 bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals) 999 if len(bounds) == 1: 1000 return bounds[0] 1001 return bounds
Given a date bound (begin, end), coerce a timezone if necessary.
12def register( 13 self, 14 debug: bool = False, 15 **kw: Any 16) -> SuccessTuple: 17 """ 18 Register a new Pipe along with its attributes. 19 20 Parameters 21 ---------- 22 debug: bool, default False 23 Verbosity toggle. 24 25 kw: Any 26 Keyword arguments to pass to `instance_connector.register_pipe()`. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 """ 32 if self.temporary: 33 return False, "Cannot register pipes created with `temporary=True` (read-only)." 34 35 from meerschaum.utils.formatting import get_console 36 from meerschaum.utils.venv import Venv 37 from meerschaum.connectors import get_connector_plugin, custom_types 38 from meerschaum.config._patch import apply_patch_to_config 39 40 import warnings 41 with warnings.catch_warnings(): 42 warnings.simplefilter('ignore') 43 try: 44 _conn = self.connector 45 except Exception: 46 _conn = None 47 48 if isinstance(_conn, str): 49 _conn = None 50 51 if ( 52 _conn is not None 53 and 54 (_conn.type == 'plugin' or _conn.type in custom_types) 55 and 56 getattr(_conn, 'register', None) is not None 57 ): 58 try: 59 with Venv(get_connector_plugin(_conn), debug=debug): 60 params = self.connector.register(self) 61 except Exception: 62 get_console().print_exception() 63 params = None 64 params = {} if params is None else params 65 if not isinstance(params, dict): 66 from meerschaum.utils.warnings import warn 67 warn( 68 f"Invalid parameters returned from `register()` in connector {self.connector}:\n" 69 + f"{params}" 70 ) 71 else: 72 self.parameters = apply_patch_to_config(params, self.parameters) 73 74 if not self.parameters: 75 cols = self.columns if self.columns else {'datetime': None, 'id': None} 76 self.parameters = { 77 'columns': cols, 78 } 79 80 with Venv(get_connector_plugin(self.instance_connector)): 81 return self.instance_connector.register_pipe(self, debug=debug, **kw)
Register a new Pipe along with its attributes.
Parameters
- debug (bool, default False): Verbosity toggle.
- kw (Any):
Keyword arguments to pass to
instance_connector.register_pipe().
Returns
- A
SuccessTupleof success, message.
20@property 21def attributes(self) -> Dict[str, Any]: 22 """ 23 Return a dictionary of a pipe's keys and parameters. 24 These values are reflected directly from the pipes table of the instance. 25 """ 26 from meerschaum.config import get_config 27 from meerschaum.config._patch import apply_patch_to_config 28 from meerschaum.utils.venv import Venv 29 from meerschaum.connectors import get_connector_plugin 30 from meerschaum.utils.dtypes import get_current_timestamp 31 32 timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds') 33 34 now = get_current_timestamp('ms', as_int=True) / 1000 35 _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug) 36 timed_out = ( 37 _attributes_sync_time is None 38 or 39 (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds) 40 ) 41 if not self.temporary and timed_out: 42 self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug) 43 local_attributes = self._get_cached_value('attributes', debug=self.debug) or {} 44 with Venv(get_connector_plugin(self.instance_connector)): 45 instance_attributes = self.instance_connector.get_pipe_attributes(self) 46 47 self._cache_value( 48 'attributes', 49 apply_patch_to_config(instance_attributes, local_attributes), 50 memory_only=True, 51 debug=self.debug, 52 ) 53 54 return self._attributes
Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.
179@property 180def parameters(self) -> Optional[Dict[str, Any]]: 181 """ 182 Return the parameters dictionary of the pipe. 183 """ 184 return self.get_parameters(debug=self.debug)
Return the parameters dictionary of the pipe.
196@property 197def columns(self) -> Union[Dict[str, str], None]: 198 """ 199 Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`. 200 """ 201 cols = self.parameters.get('columns', {}) 202 if not isinstance(cols, dict): 203 return {} 204 return {col_ix: col for col_ix, col in cols.items() if col and col_ix}
Return the columns dictionary defined in meerschaum.Pipe.parameters.
221@property 222def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]: 223 """ 224 Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`. 225 """ 226 _parameters = self.get_parameters(debug=self.debug) 227 indices_key = ( 228 'indexes' 229 if 'indexes' in _parameters 230 else 'indices' 231 ) 232 233 _indices = _parameters.get(indices_key, {}) 234 _columns = self.columns 235 dt_col = _columns.get('datetime', None) 236 if not isinstance(_indices, dict): 237 _indices = {} 238 unique_cols = list(set(( 239 [dt_col] 240 if dt_col 241 else [] 242 ) + [ 243 col 244 for col_ix, col in _columns.items() 245 if col and col_ix != 'datetime' 246 ])) 247 return { 248 **({'unique': unique_cols} if len(unique_cols) > 1 else {}), 249 **{col_ix: col for col_ix, col in _columns.items() if col}, 250 **_indices 251 }
Return the indices dictionary defined in meerschaum.Pipe.parameters.
254@property 255def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]: 256 """ 257 Alias for `meerschaum.Pipe.indices`. 258 """ 259 return self.indices
Alias for meerschaum.Pipe.indices.
310@property 311def dtypes(self) -> Dict[str, Any]: 312 """ 313 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 314 """ 315 return self.get_dtypes(refresh=False, debug=self.debug)
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
418@property 419def autoincrement(self) -> bool: 420 """ 421 Return the `autoincrement` parameter for the pipe. 422 """ 423 return self.parameters.get('autoincrement', False)
Return the autoincrement parameter for the pipe.
434@property 435def autotime(self) -> bool: 436 """ 437 Return the `autotime` parameter for the pipe. 438 """ 439 return self.parameters.get('autotime', False)
Return the autotime parameter for the pipe.
385@property 386def upsert(self) -> bool: 387 """ 388 Return whether `upsert` is set for the pipe. 389 """ 390 return self.parameters.get('upsert', False)
Return whether upsert is set for the pipe.
401@property 402def static(self) -> bool: 403 """ 404 Return whether `static` is set for the pipe. 405 """ 406 return self.parameters.get('static', False)
Return whether static is set for the pipe.
450@property 451def tzinfo(self) -> Union[None, timezone]: 452 """ 453 Return `timezone.utc` if the pipe is timezone-aware. 454 """ 455 _tzinfo = self._get_cached_value('tzinfo', debug=self.debug) 456 if _tzinfo is not None: 457 return _tzinfo if _tzinfo != 'None' else None 458 459 _tzinfo = None 460 dt_col = self.columns.get('datetime', None) 461 dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None 462 if self.autotime: 463 ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 464 ts_typ = self.dtypes.get(ts_col, 'datetime') 465 dt_typ = ts_typ 466 467 if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime': 468 _tzinfo = timezone.utc 469 470 self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug) 471 return _tzinfo
Return timezone.utc if the pipe is timezone-aware.
474@property 475def enforce(self) -> bool: 476 """ 477 Return the `enforce` parameter for the pipe. 478 """ 479 return self.parameters.get('enforce', True)
Return the enforce parameter for the pipe.
490@property 491def null_indices(self) -> bool: 492 """ 493 Return the `null_indices` parameter for the pipe. 494 """ 495 return self.parameters.get('null_indices', True)
Return the null_indices parameter for the pipe.
506@property 507def mixed_numerics(self) -> bool: 508 """ 509 Return the `mixed_numerics` parameter for the pipe. 510 """ 511 return self.parameters.get('mixed_numerics', True)
Return the mixed_numerics parameter for the pipe.
522def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]: 523 """ 524 Check if the requested columns are defined. 525 526 Parameters 527 ---------- 528 *args: str 529 The column names to be retrieved. 530 531 error: bool, default False 532 If `True`, raise an `Exception` if the specified column is not defined. 533 534 Returns 535 ------- 536 A tuple of the same size of `args` or a `str` if `args` is a single argument. 537 538 Examples 539 -------- 540 >>> pipe = mrsm.Pipe('test', 'test') 541 >>> pipe.columns = {'datetime': 'dt', 'id': 'id'} 542 >>> pipe.get_columns('datetime', 'id') 543 ('dt', 'id') 544 >>> pipe.get_columns('value', error=True) 545 Exception: 🛑 Missing 'value' column for Pipe('test', 'test'). 546 """ 547 from meerschaum.utils.warnings import error as _error 548 if not args: 549 args = tuple(self.columns.keys()) 550 col_names = [] 551 for col in args: 552 col_name = None 553 try: 554 col_name = self.columns[col] 555 if col_name is None and error: 556 _error(f"Please define the name of the '{col}' column for {self}.") 557 except Exception: 558 col_name = None 559 if col_name is None and error: 560 _error(f"Missing '{col}'" + f" column for {self}.") 561 col_names.append(col_name) 562 if len(col_names) == 1: 563 return col_names[0] 564 return tuple(col_names)
Check if the requested columns are defined.
Parameters
- *args (str): The column names to be retrieved.
- error (bool, default False):
If
True, raise anExceptionif the specified column is not defined.
Returns
- A tuple of the same size of
argsor astrifargsis a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception: 🛑 Missing 'value' column for Pipe('test', 'test').
567def get_columns_types( 568 self, 569 refresh: bool = False, 570 debug: bool = False, 571) -> Union[Dict[str, str], None]: 572 """ 573 Get a dictionary of a pipe's column names and their types. 574 575 Parameters 576 ---------- 577 refresh: bool, default False 578 If `True`, invalidate the cache and fetch directly from the instance connector. 579 580 debug: bool, default False: 581 Verbosity toggle. 582 583 Returns 584 ------- 585 A dictionary of column names (`str`) to column types (`str`). 586 587 Examples 588 -------- 589 >>> pipe.get_columns_types() 590 { 591 'dt': 'TIMESTAMP WITH TIMEZONE', 592 'id': 'BIGINT', 593 'val': 'DOUBLE PRECISION', 594 } 595 >>> 596 """ 597 from meerschaum.connectors import get_connector_plugin 598 from meerschaum.utils.dtypes import get_current_timestamp 599 600 now = get_current_timestamp('ms', as_int=True) / 1000 601 cache_seconds = ( 602 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 603 if self.static 604 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 605 ) 606 if refresh: 607 self._clear_cache_key('_columns_types_timestamp', debug=debug) 608 self._clear_cache_key('_columns_types', debug=debug) 609 610 _columns_types = self._get_cached_value('_columns_types', debug=debug) 611 if _columns_types: 612 columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug) 613 if columns_types_timestamp is not None: 614 delta = now - columns_types_timestamp 615 if delta < cache_seconds: 616 if debug: 617 dprint( 618 f"Returning cached `columns_types` for {self} " 619 f"({round(delta, 2)} seconds old)." 620 ) 621 return _columns_types 622 623 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 624 _columns_types = ( 625 self.instance_connector.get_pipe_columns_types(self, debug=debug) 626 if hasattr(self.instance_connector, 'get_pipe_columns_types') 627 else None 628 ) 629 630 self._cache_value('_columns_types', _columns_types, debug=debug) 631 self._cache_value('_columns_types_timestamp', now, debug=debug) 632 return _columns_types or {}
Get a dictionary of a pipe's column names and their types.
Parameters
- refresh (bool, default False):
If
True, invalidate the cache and fetch directly from the instance connector. - debug (bool, default False:): Verbosity toggle.
Returns
- A dictionary of column names (
str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
'dt': 'TIMESTAMP WITH TIMEZONE',
'id': 'BIGINT',
'val': 'DOUBLE PRECISION',
}
>>>
635def get_columns_indices( 636 self, 637 debug: bool = False, 638 refresh: bool = False, 639) -> Dict[str, List[Dict[str, str]]]: 640 """ 641 Return a dictionary mapping columns to index information. 642 """ 643 from meerschaum.connectors import get_connector_plugin 644 from meerschaum.utils.dtypes import get_current_timestamp 645 646 now = get_current_timestamp('ms', as_int=True) / 1000 647 cache_seconds = ( 648 mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds') 649 if self.static 650 else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds') 651 ) 652 if refresh: 653 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 654 self._clear_cache_key('_columns_indices', debug=debug) 655 656 _columns_indices = self._get_cached_value('_columns_indices', debug=debug) 657 658 if _columns_indices: 659 columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug) 660 if columns_indices_timestamp is not None: 661 delta = now - columns_indices_timestamp 662 if delta < cache_seconds: 663 if debug: 664 dprint( 665 f"Returning cached `columns_indices` for {self} " 666 f"({round(delta, 2)} seconds old)." 667 ) 668 return _columns_indices 669 670 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 671 _columns_indices = ( 672 self.instance_connector.get_pipe_columns_indices(self, debug=debug) 673 if hasattr(self.instance_connector, 'get_pipe_columns_indices') 674 else None 675 ) 676 677 self._cache_value('_columns_indices', _columns_indices, debug=debug) 678 self._cache_value('_columns_indices_timestamp', now, debug=debug) 679 return {k: v for k, v in _columns_indices.items() if k and v} or {}
Return a dictionary mapping columns to index information.
1086def get_indices(self) -> Dict[str, str]: 1087 """ 1088 Return a dictionary mapping index keys to their names in the database. 1089 1090 Returns 1091 ------- 1092 A dictionary of index keys to index names. 1093 """ 1094 from meerschaum.connectors import get_connector_plugin 1095 with mrsm.Venv(get_connector_plugin(self.instance_connector)): 1096 if hasattr(self.instance_connector, 'get_pipe_index_names'): 1097 result = self.instance_connector.get_pipe_index_names(self) 1098 else: 1099 result = {} 1100 1101 return result
Return a dictionary mapping index keys to their names in the database.
Returns
- A dictionary of index keys to index names.
59def get_parameters( 60 self, 61 apply_symlinks: bool = True, 62 refresh: bool = False, 63 debug: bool = False, 64 _visited: 'Optional[set[mrsm.Pipe]]' = None, 65) -> Dict[str, Any]: 66 """ 67 Return the `parameters` dictionary of the pipe. 68 69 Parameters 70 ---------- 71 apply_symlinks: bool, default True 72 If `True`, resolve references to parameters from other pipes. 73 74 refresh: bool, default False 75 If `True`, pull the latest attributes for the pipe. 76 77 Returns 78 ------- 79 The pipe's parameters dictionary. 80 """ 81 from copy import deepcopy 82 from meerschaum.config._patch import apply_patch_to_config 83 from meerschaum.config._read_config import search_and_substitute_config 84 85 is_top_level = _visited is None 86 if _visited is None: 87 _visited = {self} 88 89 if refresh: 90 _ = self._invalidate_cache(hard=True) 91 ### Drop any memoized resolution so a later non-refresh call recomputes from fresh state. 92 _ = self.__dict__.pop('_resolved_parameters_raw', None) 93 _ = self.__dict__.pop('_resolved_parameters', None) 94 _ = self.__dict__.pop('_resolved_parameters_symlinks', None) 95 96 raw_parameters = self.attributes.get('parameters', {}) 97 if not apply_symlinks: 98 return raw_parameters 99 100 ### Resolving references + `{{ Pipe() }}` / `MRSM{}` symlinks is pure-Python but expensive 101 ### (it walks reference pipes and may build connectors), and `get_parameters` is a hot path 102 ### hit by `.dtypes`, `.columns`, `.precision`, etc. Memoize the resolved result, keyed on the 103 ### identity of the raw parameters dict: every mutation path (`update_parameters`, the setter, 104 ### `edit`) reassigns `_attributes['parameters']` to a new object, so identity changing is a 105 ### reliable invalidation signal. Schema is *not* part of this — dynamic-schema freshness is 106 ### handled separately by `get_columns_types`' TTL cache, so this is safe for dynamic pipes. 107 ### Only memoize the top-level entry (not nested reference resolution, which threads `_visited` 108 ### for cycle detection) and only the default symlink-resolving, non-refreshing call. 109 can_memoize = is_top_level and not refresh 110 if can_memoize and self.__dict__.get('_resolved_parameters_raw', None) is raw_parameters: 111 self._symlinks = self.__dict__.get('_resolved_parameters_symlinks', {}) 112 ### Return a copy so callers that mutate the result (e.g. `infer_dtypes(persist=True)`) 113 ### don't corrupt the memo. 114 return deepcopy(self.__dict__['_resolved_parameters']) 115 116 parameters = {} 117 for ref_pipe in self.references: 118 try: 119 if ref_pipe in _visited: 120 warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.") 121 return search_and_substitute_config(raw_parameters) 122 123 _visited.add(ref_pipe) 124 if refresh: 125 _ = _cached_base_params.pop(ref_pipe, None) 126 base_params = _cached_base_params.get(ref_pipe, None) 127 if base_params is None: 128 base_params = ref_pipe.get_parameters( 129 apply_symlinks=apply_symlinks, 130 _visited=_visited, 131 debug=debug, 132 ) 133 _cached_base_params[ref_pipe] = base_params 134 if debug: 135 dprint(f"base_params from {ref_pipe} for {self}:") 136 mrsm.pprint(base_params) 137 else: 138 if debug: 139 dprint(f"Using cached base_params from {ref_pipe} for {self}") 140 except Exception as e: 141 warn(f"Failed to resolve reference pipe for {self}: {e}") 142 base_params = {} 143 144 parameters = apply_patch_to_config(parameters, base_params) 145 146 parameters = apply_patch_to_config(parameters, raw_parameters) 147 148 from meerschaum.utils.pipes import replace_pipes_syntax 149 self._symlinks = {} 150 151 def recursive_replace(obj: Any, path: tuple) -> Any: 152 if isinstance(obj, dict): 153 return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()} 154 if isinstance(obj, list): 155 return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)] 156 if isinstance(obj, str): 157 substituted_val = replace_pipes_syntax(obj, _pipe=self) 158 if substituted_val != obj: 159 self._symlinks[path] = { 160 'original': obj, 161 'substituted': substituted_val, 162 } 163 return substituted_val 164 return obj 165 166 resolved_parameters = search_and_substitute_config(recursive_replace(parameters, tuple())) 167 168 if can_memoize: 169 ### Hold a reference to the raw dict so its identity can't be reused by a freed object, 170 ### and stash the symlinks captured above alongside the resolved result. 171 self.__dict__['_resolved_parameters_raw'] = raw_parameters 172 self.__dict__['_resolved_parameters'] = resolved_parameters 173 self.__dict__['_resolved_parameters_symlinks'] = self._symlinks 174 return deepcopy(resolved_parameters) 175 176 return resolved_parameters
Return the parameters dictionary of the pipe.
Parameters
- apply_symlinks (bool, default True):
If
True, resolve references to parameters from other pipes. - refresh (bool, default False):
If
True, pull the latest attributes for the pipe.
Returns
- The pipe's parameters dictionary.
329def get_dtypes( 330 self, 331 infer: bool = True, 332 refresh: bool = False, 333 debug: bool = False, 334) -> Dict[str, Any]: 335 """ 336 If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`. 337 338 Parameters 339 ---------- 340 infer: bool, default True 341 If `True`, include the implicit existing dtypes. 342 Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`). 343 344 refresh: bool, default False 345 If `True`, invalidate any cache and return the latest known dtypes. 346 347 Returns 348 ------- 349 A dictionary mapping column names to dtypes. 350 """ 351 from meerschaum.config._patch import apply_patch_to_config 352 from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES 353 parameters = self.get_parameters(refresh=refresh, debug=debug) 354 configured_dtypes = parameters.get('dtypes', {}) 355 if debug: 356 dprint(f"Configured dtypes for {self}:") 357 mrsm.pprint(configured_dtypes) 358 359 remote_dtypes = ( 360 self.infer_dtypes(persist=False, refresh=refresh, debug=debug) 361 if infer 362 else {} 363 ) 364 if debug and infer: 365 dprint(f"Remote dtypes for {self}:") 366 mrsm.pprint(remote_dtypes) 367 368 patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {})) 369 370 dt_col = parameters.get('columns', {}).get('datetime', None) 371 primary_col = parameters.get('columns', {}).get('primary', None) 372 _dtypes = { 373 col: MRSM_ALIAS_DTYPES.get(typ, typ) 374 for col, typ in patched_dtypes.items() 375 if col and typ 376 } 377 if dt_col and dt_col not in configured_dtypes: 378 _dtypes[dt_col] = 'datetime' 379 if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes: 380 _dtypes[primary_col] = 'int' 381 382 return _dtypes
If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.
Parameters
- infer (bool, default True):
If
True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g.Pipe.parameters['dtypes']). - refresh (bool, default False):
If
True, invalidate any cache and return the latest known dtypes.
Returns
- A dictionary mapping column names to dtypes.
1104def update_parameters( 1105 self, 1106 parameters_patch: Dict[str, Any], 1107 persist: bool = True, 1108 debug: bool = False, 1109) -> mrsm.SuccessTuple: 1110 """ 1111 Apply a patch to a pipe's `parameters` dictionary. 1112 1113 Parameters 1114 ---------- 1115 parameters_patch: Dict[str, Any] 1116 The patch to be applied to `Pipe.parameters`. 1117 1118 persist: bool, default True 1119 If `True`, call `Pipe.edit()` to persist the new parameters. 1120 """ 1121 from meerschaum.config import apply_patch_to_config 1122 if 'parameters' not in self._attributes: 1123 self._attributes['parameters'] = {} 1124 1125 self._attributes['parameters'] = apply_patch_to_config( 1126 self._attributes['parameters'], 1127 parameters_patch, 1128 ) 1129 1130 if self.temporary: 1131 persist = False 1132 1133 if not persist: 1134 return True, "Success" 1135 1136 return self.edit(debug=debug)
Apply a patch to a pipe's parameters dictionary.
Parameters
- parameters_patch (Dict[str, Any]):
The patch to be applied to
Pipe.parameters. - persist (bool, default True):
If
True, callPipe.edit()to persist the new parameters.
682def get_id(self, **kw: Any) -> Union[int, str, None]: 683 """ 684 Fetch a pipe's ID from its instance connector. 685 If the pipe is not registered, return `None`. 686 """ 687 if self.temporary: 688 return None 689 690 from meerschaum.utils.venv import Venv 691 from meerschaum.connectors import get_connector_plugin 692 693 with Venv(get_connector_plugin(self.instance_connector)): 694 if hasattr(self.instance_connector, 'get_pipe_id'): 695 return self.instance_connector.get_pipe_id(self, **kw) 696 697 return None
Fetch a pipe's ID from its instance connector.
If the pipe is not registered, return None.
700@property 701def id(self) -> Union[int, str, uuid.UUID, None]: 702 """ 703 Fetch and cache a pipe's ID. 704 """ 705 _id = self._get_cached_value('_id', debug=self.debug) 706 if _id is None: 707 _id = self.get_id(debug=self.debug) 708 if _id is not None: 709 self._cache_value('_id', _id, debug=self.debug) 710 return _id
Fetch and cache a pipe's ID.
713def get_val_column(self, debug: bool = False) -> Union[str, None]: 714 """ 715 Return the name of the value column if it's defined, otherwise make an educated guess. 716 If not set in the `columns` dictionary, return the first numeric column that is not 717 an ID or datetime column. 718 If none may be found, return `None`. 719 720 Parameters 721 ---------- 722 debug: bool, default False: 723 Verbosity toggle. 724 725 Returns 726 ------- 727 Either a string or `None`. 728 """ 729 if debug: 730 dprint('Attempting to determine the value column...') 731 try: 732 val_name = self.get_columns('value') 733 except Exception: 734 val_name = None 735 if val_name is not None: 736 if debug: 737 dprint(f"Value column: {val_name}") 738 return val_name 739 740 cols = self.columns 741 if cols is None: 742 if debug: 743 dprint('No columns could be determined. Returning...') 744 return None 745 try: 746 dt_name = self.get_columns('datetime', error=False) 747 except Exception: 748 dt_name = None 749 try: 750 id_name = self.get_columns('id', errors=False) 751 except Exception: 752 id_name = None 753 754 if debug: 755 dprint(f"dt_name: {dt_name}") 756 dprint(f"id_name: {id_name}") 757 758 cols_types = self.get_columns_types(debug=debug) 759 if cols_types is None: 760 return None 761 if debug: 762 dprint(f"cols_types: {cols_types}") 763 if dt_name is not None: 764 cols_types.pop(dt_name, None) 765 if id_name is not None: 766 cols_types.pop(id_name, None) 767 768 candidates = [] 769 candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',} 770 for search_term in candidate_keywords: 771 for col, typ in cols_types.items(): 772 if search_term in typ.lower(): 773 candidates.append(col) 774 break 775 if not candidates: 776 if debug: 777 dprint("No value column could be determined.") 778 return None 779 780 return candidates[0]
Return the name of the value column if it's defined, otherwise make an educated guess.
If not set in the columns dictionary, return the first numeric column that is not
an ID or datetime column.
If none may be found, return None.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- Either a string or
None.
783@property 784def parents(self) -> List[mrsm.Pipe]: 785 """ 786 Return a list of `meerschaum.Pipe` objects to be designated as parents. 787 """ 788 _cached_parents = self.__dict__.get('_parents', None) 789 if _cached_parents is not None: 790 return _cached_parents 791 792 from meerschaum.utils.pipes import get_pipe_from_value 793 base_params = self.get_parameters() 794 key = 'parents' if 'parents' in base_params else 'parent' 795 parents_refs = base_params.get(key, None) or [] 796 if isinstance(parents_refs, str) or isinstance(parents_refs, dict): 797 parents_refs = [parents_refs] 798 799 if not parents_refs: 800 return [] 801 802 self._parents = [get_pipe_from_value(val, _pipe=self) for val in parents_refs] 803 return self._parents
Return a list of meerschaum.Pipe objects to be designated as parents.
806@property 807def parent(self) -> Union[mrsm.Pipe, None]: 808 """ 809 Return the first pipe in `self.parents` or `None`. 810 """ 811 _parents = self.parents 812 if not _parents: 813 return None 814 815 return _parents[0]
Return the first pipe in self.parents or None.
851@property 852def children(self) -> List[mrsm.Pipe]: 853 """ 854 Return a list of `meerschaum.Pipe` objects to be designated as children. 855 """ 856 _cached_children = self.__dict__.get('_children', None) 857 if _cached_children is not None: 858 return _cached_children 859 860 from meerschaum.utils.pipes import get_pipe_from_value 861 base_params = self.get_parameters() 862 key = 'children' if 'children' in base_params else 'child' 863 children_refs = base_params.get(key, None) or [] 864 if isinstance(children_refs, str) or isinstance(children_refs, dict): 865 children_refs = [children_refs] 866 867 if not children_refs: 868 return [] 869 870 self._children = [get_pipe_from_value(val, _pipe=self) for val in children_refs] 871 return self._children
Return a list of meerschaum.Pipe objects to be designated as children.
874@property 875def child(self) -> mrsm.Pipe | None: 876 """ 877 Return the first pipe in `self.children` or None. 878 """ 879 _children = self.children 880 if not _children: 881 return None 882 883 return _children[0]
Return the first pipe in self.children or None.
943@property 944def reference(self) -> mrsm.Pipe | None: 945 """ 946 Return the first pipe in `self.references` or None. 947 """ 948 _references = self.references 949 if not _references: 950 return None 951 952 return _references[0]
Return the first pipe in self.references or None.
920@property 921def references(self) -> List[mrsm.Pipe]: 922 """ 923 Return a list of `meerschaum.Pipe` objects to be designated as references. 924 """ 925 _cached_references = self.__dict__.get('_references', None) 926 if _cached_references is not None: 927 return _cached_references 928 929 from meerschaum.utils.pipes import get_pipe_from_value 930 base_params = self.get_parameters(apply_symlinks=False) 931 key = 'references' if 'references' in base_params else 'reference' 932 refs = base_params.get(key, None) or [] 933 if isinstance(refs, str) or isinstance(refs, dict): 934 refs = [refs] 935 936 if not refs: 937 return [] 938 939 self._references = [get_pipe_from_value(val, _pipe=self) for val in refs] 940 return self._references
Return a list of meerschaum.Pipe objects to be designated as references.
990@property 991def target(self) -> str: 992 """ 993 The target table name. 994 You can set the target name under on of the following keys 995 (checked in this order): 996 - `target` 997 - `target_name` 998 - `target_table` 999 - `target_table_name` 1000 """ 1001 cached_target = self.__dict__.get('_target', None) 1002 if cached_target: 1003 return cached_target 1004 1005 params = self.parameters 1006 target_val = params.get('target', None) 1007 if target_val: 1008 self.__dict__['_target'] = target_val 1009 return target_val 1010 1011 default_target = self._target_legacy() 1012 default_targets = {default_target} 1013 potential_keys = ('target_name', 'target_table', 'target_table_name') 1014 _target = None 1015 for k in potential_keys: 1016 if k in params: 1017 _target = params[k] 1018 break 1019 1020 _target = _target or default_target 1021 1022 if self.instance_connector.type == 'sql': 1023 from meerschaum.utils.sql import truncate_item_name 1024 truncated_target = truncate_item_name(_target, self.instance_connector.flavor) 1025 default_targets.add(truncated_target) 1026 warned_target = self.__dict__.get('_warned_target', False) 1027 if truncated_target != _target and not warned_target: 1028 if self.instance_connector.flavor not in ('oracle', 'mysql', 'mariadb'): 1029 warn( 1030 f"The target '{_target}' is too long for '{self.instance_connector.flavor}', " 1031 + f"will use {truncated_target} instead." 1032 ) 1033 self.__dict__['_warned_target'] = True 1034 _target = truncated_target 1035 1036 if _target not in default_targets: 1037 self.target = _target 1038 1039 self.__dict__['_target'] = _target 1040 return _target
The target table name. You can set the target name under on of the following keys (checked in this order):
targettarget_nametarget_tabletarget_table_name
1064def guess_datetime(self) -> Union[str, None]: 1065 """ 1066 Try to determine a pipe's datetime column. 1067 """ 1068 _dtypes = self.dtypes 1069 1070 ### Abort if the user explictly disallows a datetime index. 1071 if 'datetime' in _dtypes: 1072 if _dtypes['datetime'] is None: 1073 return None 1074 1075 from meerschaum.utils.dtypes import are_dtypes_equal 1076 dt_cols = [ 1077 col 1078 for col, typ in _dtypes.items() 1079 if are_dtypes_equal(typ, 'datetime') 1080 ] 1081 if not dt_cols: 1082 return None 1083 return dt_cols[0]
Try to determine a pipe's datetime column.
1228@property 1229def precision(self) -> Dict[str, Union[str, int]]: 1230 """ 1231 Return the configured or detected precision. 1232 """ 1233 return self.get_precision(debug=self.debug)
Return the configured or detected precision.
1139def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]: 1140 """ 1141 Return the timestamp precision unit and interval for the `datetime` axis. 1142 """ 1143 from meerschaum.utils.dtypes import ( 1144 MRSM_PRECISION_UNITS_SCALARS, 1145 MRSM_PRECISION_UNITS_ALIASES, 1146 MRSM_PD_DTYPES, 1147 are_dtypes_equal, 1148 ) 1149 from meerschaum._internal.static import STATIC_CONFIG 1150 1151 _precision = self._get_cached_value('precision', debug=debug) 1152 if _precision: 1153 if debug: 1154 dprint(f"Returning cached precision: {_precision}") 1155 return _precision 1156 1157 parameters = self.parameters 1158 _precision = parameters.get('precision', {}) 1159 if isinstance(_precision, str): 1160 _precision = {'unit': _precision} 1161 default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 1162 1163 if not _precision: 1164 1165 dt_col = parameters.get('columns', {}).get('datetime', None) 1166 if not dt_col and self.autotime: 1167 dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing') 1168 if not dt_col: 1169 if debug: 1170 dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.") 1171 return {'unit': default_precision_unit} 1172 1173 dt_typ = self.dtypes.get(dt_col, 'datetime') 1174 if are_dtypes_equal(dt_typ, 'datetime'): 1175 if dt_typ == 'datetime': 1176 dt_typ = MRSM_PD_DTYPES['datetime'] 1177 if debug: 1178 dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.") 1179 1180 _precision = { 1181 'unit': ( 1182 dt_typ 1183 .split('[', maxsplit=1)[-1] 1184 .split(',', maxsplit=1)[0] 1185 .split(' ', maxsplit=1)[0] 1186 ).rstrip(']') 1187 } 1188 1189 if debug: 1190 dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.") 1191 1192 elif are_dtypes_equal(dt_typ, 'int'): 1193 _precision = { 1194 'unit': ( 1195 'second' 1196 if '32' in dt_typ 1197 else default_precision_unit 1198 ) 1199 } 1200 elif are_dtypes_equal(dt_typ, 'date'): 1201 if debug: 1202 dprint("Datetime axis is 'date', falling back to 'day' precision.") 1203 _precision = {'unit': 'day'} 1204 1205 precision_unit = _precision.get('unit', default_precision_unit) 1206 precision_interval = _precision.get('interval', None) 1207 true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit) 1208 if true_precision_unit is None: 1209 if debug: 1210 dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.") 1211 true_precision_unit = default_precision_unit 1212 1213 if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS: 1214 from meerschaum.utils.misc import items_str 1215 raise ValueError( 1216 f"Invalid precision unit '{true_precision_unit}'.\n" 1217 "Accepted values are " 1218 f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}." 1219 ) 1220 1221 _precision = {'unit': true_precision_unit} 1222 if precision_interval: 1223 _precision['interval'] = precision_interval 1224 self._cache_value('precision', _precision, debug=debug) 1225 return self._precision
Return the timestamp precision unit and interval for the datetime axis.
12def show( 13 self, 14 nopretty: bool = False, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Show attributes of a Pipe. 20 21 Parameters 22 ---------- 23 nopretty: bool, default False 24 If `True`, simply print the JSON of the pipe's attributes. 25 26 debug: bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success, message. 32 33 """ 34 import json 35 from meerschaum.utils.formatting import ( 36 pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console, 37 ) 38 from meerschaum.utils.packages import import_rich, attempt_import 39 from meerschaum.utils.warnings import info 40 attributes_json = json.dumps(self.attributes) 41 if not nopretty: 42 _to_print = f"Attributes for {self}:" 43 if ANSI: 44 _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta') 45 print(_to_print) 46 rich = import_rich() 47 rich_json = attempt_import('rich.json') 48 get_console().print(rich_json.JSON(attributes_json)) 49 else: 50 print(_to_print) 51 else: 52 print(attributes_json) 53 54 return True, "Success"
Show attributes of a Pipe.
Parameters
- nopretty (bool, default False):
If
True, simply print the JSON of the pipe's attributes. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
21def edit( 22 self, 23 patch: bool = False, 24 interactive: bool = False, 25 debug: bool = False, 26 **kw: Any 27) -> SuccessTuple: 28 """ 29 Edit a Pipe's configuration. 30 31 Parameters 32 ---------- 33 patch: bool, default False 34 If `patch` is True, update parameters by cascading rather than overwriting. 35 interactive: bool, default False 36 If `True`, open an editor for the user to make changes to the pipe's YAML file. 37 debug: bool, default False 38 Verbosity toggle. 39 40 Returns 41 ------- 42 A `SuccessTuple` of success, message. 43 44 """ 45 from meerschaum.utils.venv import Venv 46 from meerschaum.connectors import get_connector_plugin 47 48 if self.temporary: 49 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 50 51 self._invalidate_cache(hard=True, debug=debug) 52 53 if hasattr(self, '_symlinks'): 54 from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path 55 for path, vals in self._symlinks.items(): 56 current_val = get_val_from_dict_path(self.parameters, path) 57 if current_val == vals['substituted']: 58 set_val_in_dict_path(self.parameters, path, vals['original']) 59 60 if not interactive: 61 with Venv(get_connector_plugin(self.instance_connector)): 62 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw) 63 64 import meerschaum.config.paths as paths 65 from meerschaum.utils.misc import edit_file 66 parameters_filename = str(self) + '.yaml' 67 parameters_path = paths.PIPES_CACHE_RESOURCES_PATH / parameters_filename 68 69 from meerschaum.utils.yaml import yaml 70 71 edit_text = f"Edit the parameters for {self}" 72 edit_top = '#' * (len(edit_text) + 4) 73 edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n' 74 75 from meerschaum.config import get_config 76 parameters = dict(get_config('pipes', 'parameters', patch=True)) 77 from meerschaum.config._patch import apply_patch_to_config 78 raw_parameters = self.attributes.get('parameters', {}) 79 parameters = apply_patch_to_config(parameters, raw_parameters) 80 81 ### write parameters to yaml file 82 with open(parameters_path, 'w+') as f: 83 f.write(edit_header) 84 yaml.dump(parameters, stream=f, sort_keys=False) 85 86 ### only quit editing if yaml is valid 87 editing = True 88 while editing: 89 edit_file(parameters_path) 90 try: 91 with open(parameters_path, 'r') as f: 92 file_parameters = yaml.load(f.read()) 93 except Exception as e: 94 from meerschaum.utils.warnings import warn 95 warn(f"Invalid format defined for '{self}':\n\n{e}") 96 input(f"Press [Enter] to correct the configuration for '{self}': ") 97 else: 98 editing = False 99 100 self.parameters = file_parameters 101 102 if debug: 103 from meerschaum.utils.formatting import pprint 104 pprint(self.parameters) 105 106 with Venv(get_connector_plugin(self.instance_connector)): 107 return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
Edit a Pipe's configuration.
Parameters
- patch (bool, default False):
If
patchis True, update parameters by cascading rather than overwriting. - interactive (bool, default False):
If
True, open an editor for the user to make changes to the pipe's YAML file. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
110def edit_definition( 111 self, 112 yes: bool = False, 113 noask: bool = False, 114 force: bool = False, 115 debug : bool = False, 116 **kw : Any 117) -> SuccessTuple: 118 """ 119 Edit a pipe's definition file and update its configuration. 120 **NOTE:** This function is interactive and should not be used in automated scripts! 121 122 Returns 123 ------- 124 A `SuccessTuple` of success, message. 125 126 """ 127 if self.temporary: 128 return False, "Cannot edit pipes created with `temporary=True` (read-only)." 129 130 from meerschaum.connectors import instance_types 131 if (self.connector is None or isinstance(self.connector, str)) or self.connector.type not in instance_types: 132 return self.edit(interactive=True, debug=debug, **kw) 133 134 import json 135 from meerschaum.utils.warnings import info, warn 136 from meerschaum.utils.debug import dprint 137 from meerschaum.config._patch import apply_patch_to_config 138 from meerschaum.utils.misc import edit_file 139 140 _parameters = self.parameters 141 if 'fetch' not in _parameters: 142 _parameters['fetch'] = {} 143 144 def _edit_api(): 145 from meerschaum.utils.prompt import prompt, yes_no 146 info( 147 f"Please enter the keys of the source pipe from '{self.connector}'.\n" + 148 "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip." 149 ) 150 151 _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None } 152 for k in _keys: 153 _keys[k] = _parameters['fetch'].get(k, None) 154 155 for k, v in _keys.items(): 156 try: 157 _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v) 158 except KeyboardInterrupt: 159 continue 160 if _keys[k] in ('', 'None', '\'None\'', '[None]'): 161 _keys[k] = None 162 163 _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys) 164 165 info("You may optionally specify additional filter parameters as JSON.") 166 print(" Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.") 167 print(" For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':") 168 print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': '))) 169 if force or yes_no( 170 "Would you like to add additional filter parameters?", 171 yes=yes, noask=noask 172 ): 173 import meerschaum.config.paths as paths 174 definition_filename = str(self) + '.json' 175 definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename 176 try: 177 definition_path.touch() 178 with open(definition_path, 'w+') as f: 179 json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2) 180 except Exception as e: 181 return False, f"Failed writing file '{definition_path}':\n" + str(e) 182 183 _params = None 184 while True: 185 edit_file(definition_path) 186 try: 187 with open(definition_path, 'r') as f: 188 _params = json.load(f) 189 except Exception as e: 190 warn(f'Failed to read parameters JSON:\n{e}', stack=False) 191 if force or yes_no( 192 "Would you like to try again?\n " 193 + "If not, the parameters JSON file will be ignored.", 194 noask=noask, yes=yes 195 ): 196 continue 197 _params = None 198 break 199 if _params is not None: 200 if 'fetch' not in _parameters: 201 _parameters['fetch'] = {} 202 _parameters['fetch']['params'] = _params 203 204 self.parameters = _parameters 205 return True, "Success" 206 207 def _edit_sql(): 208 import textwrap 209 import meerschaum.config.paths as paths 210 from meerschaum.utils.misc import edit_file 211 definition_filename = str(self) + '.sql' 212 definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename 213 214 sql_definition = _parameters['fetch'].get('definition', None) 215 if sql_definition is None: 216 sql_definition = '' 217 sql_definition = textwrap.dedent(sql_definition).lstrip() 218 219 try: 220 definition_path.touch() 221 with open(definition_path, 'w+') as f: 222 f.write(sql_definition) 223 except Exception as e: 224 return False, f"Failed writing file '{definition_path}':\n" + str(e) 225 226 edit_file(definition_path) 227 try: 228 with open(definition_path, 'r', encoding='utf-8') as f: 229 file_definition = f.read() 230 except Exception as e: 231 return False, f"Failed reading file '{definition_path}':\n" + str(e) 232 233 if sql_definition == file_definition: 234 return False, f"No changes made to definition for {self}." 235 236 if ' ' not in file_definition: 237 return False, f"Invalid SQL definition for {self}." 238 239 if debug: 240 dprint("Read SQL definition:\n\n" + file_definition) 241 _parameters['fetch']['definition'] = file_definition 242 self.parameters = _parameters 243 return True, "Success" 244 245 locals()['_edit_' + str(self.connector.type)]() 246 return self.edit(interactive=False, debug=debug, **kw)
Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!
Returns
- A
SuccessTupleof success, message.
13def update(self, *args, **kw) -> SuccessTuple: 14 """ 15 Update a pipe's parameters in its instance. 16 """ 17 kw['interactive'] = False 18 return self.edit(*args, **kw)
Update a pipe's parameters in its instance.
41def sync( 42 self, 43 df: Union[ 44 pd.DataFrame, 45 Dict[str, List[Any]], 46 List[Dict[str, Any]], 47 str, 48 InferFetch 49 ] = InferFetch, 50 begin: Union[datetime, int, str, None] = '', 51 end: Union[datetime, int, None] = None, 52 force: bool = False, 53 retries: int = 10, 54 min_seconds: int = 1, 55 check_existing: bool = True, 56 enforce_dtypes: bool = True, 57 blocking: bool = True, 58 workers: Optional[int] = None, 59 callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, 60 error_callback: Optional[Callable[[Exception], Any]] = None, 61 chunksize: Optional[int] = -1, 62 sync_chunks: bool = True, 63 debug: bool = False, 64 _inplace: bool = True, 65 **kw: Any 66) -> SuccessTuple: 67 """ 68 Fetch new data from the source and update the pipe's table with new data. 69 70 Get new remote data via fetch, get existing data in the same time period, 71 and merge the two, only keeping the unseen data. 72 73 Parameters 74 ---------- 75 df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None 76 An optional DataFrame to sync into the pipe. Defaults to `None`. 77 If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`. 78 79 begin: Union[datetime, int, str, None], default '' 80 Optionally specify the earliest datetime to search for data. 81 82 end: Union[datetime, int, str, None], default None 83 Optionally specify the latest datetime to search for data. 84 85 force: bool, default False 86 If `True`, keep trying to sync untul `retries` attempts. 87 88 retries: int, default 10 89 If `force`, how many attempts to try syncing before declaring failure. 90 91 min_seconds: Union[int, float], default 1 92 If `force`, how many seconds to sleep between retries. Defaults to `1`. 93 94 check_existing: bool, default True 95 If `True`, pull and diff with existing data from the pipe. 96 97 enforce_dtypes: bool, default True 98 If `True`, enforce dtypes on incoming data. 99 Set this to `False` if the incoming rows are expected to be of the correct dtypes. 100 101 blocking: bool, default True 102 If `True`, wait for sync to finish and return its result, otherwise 103 asyncronously sync (oxymoron?) and return success. Defaults to `True`. 104 Only intended for specific scenarios. 105 106 workers: Optional[int], default None 107 If provided and the instance connector is thread-safe 108 (`pipe.instance_connector.IS_THREAD_SAFE is True`), 109 limit concurrent sync to this many threads. 110 111 callback: Optional[Callable[[Tuple[bool, str]], Any]], default None 112 Callback function which expects a SuccessTuple as input. 113 Only applies when `blocking=False`. 114 115 error_callback: Optional[Callable[[Exception], Any]], default None 116 Callback function which expects an Exception as input. 117 Only applies when `blocking=False`. 118 119 chunksize: int, default -1 120 Specify the number of rows to sync per chunk. 121 If `-1`, resort to system configuration (default is `900`). 122 A `chunksize` of `None` will sync all rows in one transaction. 123 124 sync_chunks: bool, default True 125 If possible, sync chunks while fetching them into memory. 126 127 debug: bool, default False 128 Verbosity toggle. Defaults to False. 129 130 Returns 131 ------- 132 A `SuccessTuple` of success (`bool`) and message (`str`). 133 """ 134 from meerschaum.utils.debug import dprint, _checkpoint 135 from meerschaum.utils.formatting import get_console 136 from meerschaum.utils.venv import Venv 137 from meerschaum.connectors import get_connector_plugin 138 from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments 139 from meerschaum.utils.pool import get_pool 140 from meerschaum.config import get_config 141 from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp 142 143 if (callback is not None or error_callback is not None) and blocking: 144 warn("Callback functions are only executed when blocking = False. Ignoring...") 145 146 _checkpoint(_total=2, **kw) 147 148 if chunksize == 0: 149 chunksize = None 150 sync_chunks = False 151 152 begin, end = self.parse_date_bounds(begin, end) 153 kw.update({ 154 'begin': begin, 155 'end': end, 156 'force': force, 157 'retries': retries, 158 'min_seconds': min_seconds, 159 'check_existing': check_existing, 160 'blocking': blocking, 161 'workers': workers, 162 'callback': callback, 163 'error_callback': error_callback, 164 'sync_chunks': sync_chunks, 165 'chunksize': chunksize, 166 'safe_copy': True, 167 }) 168 169 self._invalidate_cache(debug=debug) 170 self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug) 171 172 def _sync( 173 p: mrsm.Pipe, 174 df: Union[ 175 'pd.DataFrame', 176 Dict[str, List[Any]], 177 List[Dict[str, Any]], 178 str, 179 InferFetch 180 ] = InferFetch, 181 ) -> SuccessTuple: 182 if df is None: 183 p._invalidate_cache(debug=debug) 184 return ( 185 False, 186 f"You passed `None` instead of data into `sync()` for {p}.\n" 187 + "Omit the DataFrame to infer fetching.", 188 ) 189 ### Ensure that Pipe is registered. 190 if not p.temporary and p.id is None: 191 ### NOTE: This may trigger an interactive session for plugins! 192 register_success, register_msg = p.register(debug=debug) 193 if not register_success: 194 if 'already' not in register_msg: 195 p._invalidate_cache(debug=debug) 196 return register_success, register_msg 197 198 if isinstance(df, str): 199 from meerschaum.utils.dataframe import parse_simple_lines 200 df = parse_simple_lines(df) 201 202 ### If connector is a plugin with a `sync()` method, return that instead. 203 ### If the plugin does not have a `sync()` method but does have a `fetch()` method, 204 ### use that instead. 205 ### NOTE: The DataFrame must be omitted for the plugin sync method to apply. 206 ### If a DataFrame is provided, continue as expected. 207 if hasattr(df, 'MRSM_INFER_FETCH'): 208 try: 209 if isinstance(p.connector, str): 210 if ':' not in p.connector_keys: 211 return True, f"{p} does not support fetching; nothing to do." 212 213 msg = f"{p} does not have a valid connector." 214 if p.connector_keys.startswith('plugin:'): 215 msg += f"\n Perhaps {p.connector_keys} has a syntax error?" 216 p._invalidate_cache(debug=debug) 217 return False, msg 218 except Exception: 219 p._invalidate_cache(debug=debug) 220 return False, f"Unable to create the connector for {p}." 221 222 ### Sync in place if possible. 223 if ( 224 str(self.connector) == str(self.instance_connector) 225 and 226 hasattr(self.instance_connector, 'sync_pipe_inplace') 227 and 228 _inplace 229 and 230 get_config('system', 'experimental', 'inplace_sync') 231 ): 232 with Venv(get_connector_plugin(self.instance_connector)): 233 p._invalidate_cache(debug=debug) 234 _args, _kwargs = filter_arguments( 235 p.instance_connector.sync_pipe_inplace, 236 p, 237 debug=debug, 238 **kw 239 ) 240 return self.instance_connector.sync_pipe_inplace( 241 *_args, 242 **_kwargs 243 ) 244 245 ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods. 246 try: 247 if getattr(p.connector, 'sync', None) is not None: 248 with Venv(get_connector_plugin(p.connector), debug=debug): 249 _args, _kwargs = filter_arguments( 250 p.connector.sync, 251 p, 252 debug=debug, 253 **kw 254 ) 255 return_tuple = p.connector.sync(*_args, **_kwargs) 256 p._invalidate_cache(debug=debug) 257 if not isinstance(return_tuple, tuple): 258 return_tuple = ( 259 False, 260 f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}" 261 ) 262 return return_tuple 263 264 except Exception as e: 265 get_console().print_exception() 266 msg = f"Failed to sync {p} with exception: '" + str(e) + "'" 267 if debug: 268 error(msg, silent=False) 269 p._invalidate_cache(debug=debug) 270 return False, msg 271 272 ### Fetch the dataframe from the connector's `fetch()` method. 273 try: 274 with Venv(get_connector_plugin(p.connector), debug=debug): 275 df = p.fetch( 276 **filter_keywords( 277 p.fetch, 278 debug=debug, 279 **kw 280 ) 281 ) 282 kw['safe_copy'] = False 283 except Exception as e: 284 get_console().print_exception( 285 suppress=[ 286 'meerschaum/core/Pipe/_sync.py', 287 'meerschaum/core/Pipe/_fetch.py', 288 ] 289 ) 290 msg = f"Failed to fetch data from {p.connector}:\n {e}" 291 df = None 292 293 if df is None: 294 p._invalidate_cache(debug=debug) 295 return False, f"No data were fetched for {p}." 296 297 if isinstance(df, list): 298 if len(df) == 0: 299 return True, f"No new rows were returned for {p}." 300 301 ### May be a chunk hook results list. 302 if isinstance(df[0], tuple): 303 success = all([_success for _success, _ in df]) 304 message = '\n'.join([_message for _, _message in df]) 305 return success, message 306 307 if df is True: 308 p._invalidate_cache(debug=debug) 309 return True, f"{p} is being synced in parallel." 310 311 ### CHECKPOINT: Retrieved the DataFrame. 312 _checkpoint(**kw) 313 314 ### Allow for dataframe generators or iterables. 315 if df_is_chunk_generator(df): 316 kw['workers'] = p.get_num_workers(kw.get('workers', None)) 317 dt_col = p.columns.get('datetime', None) 318 pool = get_pool(workers=kw.get('workers', 1)) 319 if debug: 320 dprint(f"Received {type(df)}. Attempting to sync first chunk...") 321 322 try: 323 chunk = next(df) 324 except StopIteration: 325 return True, "Received an empty generator; nothing to do." 326 327 chunk_success, chunk_msg = _sync(p, chunk) 328 chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg 329 if not chunk_success: 330 return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}" 331 if debug: 332 dprint("Successfully synced the first chunk, attemping the rest...") 333 334 def _process_chunk(_chunk): 335 _chunk_attempts = 0 336 _max_chunk_attempts = 3 337 while _chunk_attempts < _max_chunk_attempts: 338 try: 339 _chunk_success, _chunk_msg = _sync(p, _chunk) 340 except Exception as e: 341 _chunk_success, _chunk_msg = False, str(e) 342 if _chunk_success: 343 break 344 _chunk_attempts += 1 345 _sleep_seconds = _chunk_attempts ** 2 346 warn( 347 ( 348 f"Failed to sync chunk to {self} " 349 + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n" 350 + f"Sleeping for {_sleep_seconds} second" 351 + ('s' if _sleep_seconds != 1 else '') 352 + f":\n{_chunk_msg}" 353 ), 354 stack=False, 355 ) 356 time.sleep(_sleep_seconds) 357 358 num_rows_str = ( 359 f"{num_rows:,} rows" 360 if (num_rows := len(_chunk)) != 1 361 else f"{num_rows} row" 362 ) 363 _chunk_msg = ( 364 ( 365 "Synced" 366 if _chunk_success 367 else "Failed to sync" 368 ) + f" a chunk ({num_rows_str}) to {p}:\n" 369 + self._get_chunk_label(_chunk, dt_col) 370 + '\n' 371 + _chunk_msg 372 ) 373 374 mrsm.pprint((_chunk_success, _chunk_msg), calm=True) 375 return _chunk_success, _chunk_msg 376 377 results = sorted( 378 [(chunk_success, chunk_msg)] + ( 379 list(pool.imap(_process_chunk, df)) 380 if ( 381 not df_is_chunk_generator(chunk) # Handle nested generators. 382 and kw.get('workers', 1) != 1 383 ) 384 else list( 385 _process_chunk(_child_chunks) 386 for _child_chunks in df 387 ) 388 ) 389 ) 390 chunk_messages = [chunk_msg for _, chunk_msg in results] 391 success_bools = [chunk_success for chunk_success, _ in results] 392 num_successes = len([chunk_success for chunk_success, _ in results if chunk_success]) 393 num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success]) 394 success = all(success_bools) 395 msg = ( 396 'Synced ' 397 + f'{len(chunk_messages):,} chunk' 398 + ('s' if len(chunk_messages) != 1 else '') 399 + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n' 400 + '\n\n'.join(chunk_messages).lstrip().rstrip() 401 ).lstrip().rstrip() 402 return success, msg 403 404 ### Cast to a dataframe and ensure datatypes are what we expect. 405 dtypes = p.get_dtypes(debug=debug) 406 df = p.enforce_dtypes( 407 df, 408 chunksize=chunksize, 409 enforce=enforce_dtypes, 410 dtypes=dtypes, 411 debug=debug, 412 ) 413 if p.autotime: 414 dt_col = p.columns.get('datetime', None) 415 ts_col = dt_col or mrsm.get_config( 416 'pipes', 'autotime', 'column_name_if_datetime_missing' 417 ) 418 ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime' 419 if ts_col and hasattr(df, 'columns') and ts_col not in df.columns: 420 precision = p.get_precision(debug=debug) 421 now = get_current_timestamp( 422 precision_unit=precision.get( 423 'unit', 424 STATIC_CONFIG['dtypes']['datetime']['default_precision_unit'] 425 ), 426 precision_interval=precision.get('interval', 1), 427 round_to=(precision.get('round_to', 'down')), 428 as_int=(are_dtypes_equal(ts_typ, 'int')), 429 ) 430 if debug: 431 dprint(f"Adding current timestamp to dataframe synced to {p}: {now}") 432 433 df[ts_col] = now 434 kw['check_existing'] = dt_col is not None 435 436 ### Capture special columns. 437 capture_success, capture_msg = self._persist_new_special_columns( 438 df, 439 dtypes=dtypes, 440 debug=debug, 441 ) 442 if not capture_success: 443 warn(f"Failed to capture new special columns for {self}:\n{capture_msg}") 444 445 if debug: 446 dprint( 447 "DataFrame to sync:\n" 448 + ( 449 str(df)[:255] 450 + '...' 451 if len(str(df)) >= 256 452 else str(df) 453 ), 454 **kw 455 ) 456 457 ### if force, continue to sync until success 458 return_tuple = False, f"Did not sync {p}." 459 run = True 460 _retries = 1 461 while run: 462 with Venv(get_connector_plugin(self.instance_connector)): 463 return_tuple = p.instance_connector.sync_pipe( 464 pipe=p, 465 df=df, 466 debug=debug, 467 **kw 468 ) 469 _retries += 1 470 run = (not return_tuple[0]) and force and _retries <= retries 471 if run and debug: 472 dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw) 473 dprint(f"Sleeping for {min_seconds} seconds...", **kw) 474 time.sleep(min_seconds) 475 if _retries > retries: 476 warn( 477 f"Unable to sync {p} within {retries} attempt" + 478 ("s" if retries != 1 else "") + "!" 479 ) 480 481 ### CHECKPOINT: Finished syncing. 482 _checkpoint(**kw) 483 p._invalidate_cache(debug=debug) 484 485 ### Automatically apply a compression policy if the pipe is configured for compression. 486 if return_tuple[0] and p.parameters.get('compress', False): 487 if hasattr(p.instance_connector, 'apply_compression_policy'): 488 try: 489 with Venv(get_connector_plugin(p.instance_connector)): 490 compress_success, compress_msg = ( 491 p.instance_connector.apply_compression_policy(p, debug=debug) 492 ) 493 if not compress_success and debug: 494 dprint(f"Could not apply compression policy to {p}:\n{compress_msg}") 495 except Exception as compress_e: 496 warn( 497 f"Failed to apply compression policy to {p}:\n{compress_e}", 498 stack=False, 499 ) 500 501 return return_tuple 502 503 if blocking: 504 return _sync(self, df=df) 505 506 from meerschaum.utils.threading import Thread 507 def default_callback(result_tuple: SuccessTuple): 508 dprint(f"Asynchronous result from {self}: {result_tuple}", **kw) 509 510 def default_error_callback(x: Exception): 511 dprint(f"Error received for {self}: {x}", **kw) 512 513 if callback is None and debug: 514 callback = default_callback 515 if error_callback is None and debug: 516 error_callback = default_error_callback 517 try: 518 thread = Thread( 519 target=_sync, 520 args=(self,), 521 kwargs={'df': df}, 522 daemon=False, 523 callback=callback, 524 error_callback=error_callback, 525 ) 526 thread.start() 527 except Exception as e: 528 self._invalidate_cache(debug=debug) 529 return False, str(e) 530 531 self._invalidate_cache(debug=debug) 532 return True, f"Spawned asyncronous sync for {self}."
Fetch new data from the source and update the pipe's table with new data.
Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.
Parameters
- df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None):
An optional DataFrame to sync into the pipe. Defaults to
None. Ifdfis a string, it will be parsed viameerschaum.utils.dataframe.parse_simple_lines(). - begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
- end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
- force (bool, default False):
If
True, keep trying to sync untulretriesattempts. - retries (int, default 10):
If
force, how many attempts to try syncing before declaring failure. - min_seconds (Union[int, float], default 1):
If
force, how many seconds to sleep between retries. Defaults to1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. - enforce_dtypes (bool, default True):
If
True, enforce dtypes on incoming data. Set this toFalseif the incoming rows are expected to be of the correct dtypes. - blocking (bool, default True):
If
True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults toTrue. Only intended for specific scenarios. - workers (Optional[int], default None):
If provided and the instance connector is thread-safe
(
pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads. - callback (Optional[Callable[[Tuple[bool, str]], Any]], default None):
Callback function which expects a SuccessTuple as input.
Only applies when
blocking=False. - error_callback (Optional[Callable[[Exception], Any]], default None):
Callback function which expects an Exception as input.
Only applies when
blocking=False. - chunksize (int, default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. - sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
- debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
535def get_sync_time( 536 self, 537 params: Optional[Dict[str, Any]] = None, 538 newest: bool = True, 539 apply_backtrack_interval: bool = False, 540 remote: bool = False, 541 round_down: bool = False, 542 debug: bool = False 543) -> Union['datetime', int, None]: 544 """ 545 Get the most recent datetime value for a Pipe. 546 547 Parameters 548 ---------- 549 params: Optional[Dict[str, Any]], default None 550 Dictionary to build a WHERE clause for a specific column. 551 See `meerschaum.utils.sql.build_where`. 552 553 newest: bool, default True 554 If `True`, get the most recent datetime (honoring `params`). 555 If `False`, get the oldest datetime (`ASC` instead of `DESC`). 556 557 apply_backtrack_interval: bool, default False 558 If `True`, subtract the backtrack interval from the sync time. 559 560 remote: bool, default False 561 If `True` and the instance connector supports it, return the sync time 562 for the remote table definition. 563 564 round_down: bool, default False 565 If `True`, round down the datetime value to the nearest minute. 566 567 debug: bool, default False 568 Verbosity toggle. 569 570 Returns 571 ------- 572 A `datetime` or int, if the pipe exists, otherwise `None`. 573 574 """ 575 from meerschaum.utils.venv import Venv 576 from meerschaum.connectors import get_connector_plugin 577 from meerschaum.utils.misc import filter_keywords 578 from meerschaum.utils.dtypes import round_time 579 from meerschaum.utils.warnings import warn 580 581 if not self.columns.get('datetime', None): 582 return None 583 584 connector = self.instance_connector if not remote else self.connector 585 if isinstance(connector, str) or connector is None: 586 return None 587 588 with Venv(get_connector_plugin(connector)): 589 if not hasattr(connector, 'get_sync_time'): 590 warn( 591 f"Connectors of type '{connector.type}' " 592 "do not implement `get_sync_time().", 593 stack=False, 594 ) 595 return None 596 sync_time = connector.get_sync_time( 597 self, 598 **filter_keywords( 599 connector.get_sync_time, 600 params=params, 601 newest=newest, 602 remote=remote, 603 debug=debug, 604 ) 605 ) 606 607 if round_down and isinstance(sync_time, datetime): 608 sync_time = round_time(sync_time, timedelta(minutes=1)) 609 610 if apply_backtrack_interval and sync_time is not None: 611 backtrack_interval = self.get_backtrack_interval(debug=debug) 612 try: 613 sync_time -= backtrack_interval 614 except Exception as e: 615 warn(f"Failed to apply backtrack interval:\n{e}") 616 617 return self.parse_date_bounds(sync_time)
Get the most recent datetime value for a Pipe.
Parameters
- params (Optional[Dict[str, Any]], default None):
Dictionary to build a WHERE clause for a specific column.
See
meerschaum.utils.sql.build_where. - newest (bool, default True):
If
True, get the most recent datetime (honoringparams). IfFalse, get the oldest datetime (ASCinstead ofDESC). - apply_backtrack_interval (bool, default False):
If
True, subtract the backtrack interval from the sync time. - remote (bool, default False):
If
Trueand the instance connector supports it, return the sync time for the remote table definition. - round_down (bool, default False):
If
True, round down the datetime value to the nearest minute. - debug (bool, default False): Verbosity toggle.
Returns
- A
datetimeor int, if the pipe exists, otherwiseNone.
620def exists( 621 self, 622 debug: bool = False 623) -> bool: 624 """ 625 See if a Pipe's table exists. 626 627 Parameters 628 ---------- 629 debug: bool, default False 630 Verbosity toggle. 631 632 Returns 633 ------- 634 A `bool` corresponding to whether a pipe's underlying table exists. 635 636 """ 637 from meerschaum.utils.venv import Venv 638 from meerschaum.connectors import get_connector_plugin 639 from meerschaum.utils.debug import dprint 640 from meerschaum.utils.dtypes import get_current_timestamp 641 now = get_current_timestamp('ms', as_int=True) / 1000 642 cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds') 643 644 _exists = self._get_cached_value('_exists', debug=debug) 645 if _exists: 646 exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug) 647 if exists_timestamp is not None: 648 delta = now - exists_timestamp 649 if delta < cache_seconds: 650 if debug: 651 dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).") 652 return _exists 653 654 with Venv(get_connector_plugin(self.instance_connector)): 655 _exists = ( 656 self.instance_connector.pipe_exists(pipe=self, debug=debug) 657 if hasattr(self.instance_connector, 'pipe_exists') 658 else False 659 ) 660 661 self._cache_value('_exists', _exists, debug=debug) 662 self._cache_value('_exists_timestamp', now, debug=debug) 663 return _exists
See if a Pipe's table exists.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
boolcorresponding to whether a pipe's underlying table exists.
666def filter_existing( 667 self, 668 df: 'pd.DataFrame', 669 safe_copy: bool = True, 670 date_bound_only: bool = False, 671 include_unchanged_columns: bool = False, 672 enforce_dtypes: bool = False, 673 chunksize: Optional[int] = -1, 674 debug: bool = False, 675 **kw 676) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']: 677 """ 678 Inspect a dataframe and filter out rows which already exist in the pipe. 679 680 Parameters 681 ---------- 682 df: 'pd.DataFrame' 683 The dataframe to inspect and filter. 684 685 safe_copy: bool, default True 686 If `True`, create a copy before comparing and modifying the dataframes. 687 Setting to `False` may mutate the DataFrames. 688 See `meerschaum.utils.dataframe.filter_unseen_df`. 689 690 date_bound_only: bool, default False 691 If `True`, only use the datetime index to fetch the sample dataframe. 692 693 include_unchanged_columns: bool, default False 694 If `True`, include the backtrack columns which haven't changed in the update dataframe. 695 This is useful if you can't update individual keys. 696 697 enforce_dtypes: bool, default False 698 If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes. 699 Setting `enforce_dtypes=True` may impact performance. 700 701 chunksize: Optional[int], default -1 702 The `chunksize` used when fetching existing data. 703 704 debug: bool, default False 705 Verbosity toggle. 706 707 Returns 708 ------- 709 A tuple of three pandas DataFrames: unseen, update, and delta. 710 """ 711 from meerschaum.utils.warnings import warn 712 from meerschaum.utils.debug import dprint 713 from meerschaum.utils.packages import attempt_import, import_pandas 714 from meerschaum.utils.dataframe import ( 715 filter_unseen_df, 716 add_missing_cols_to_df, 717 get_unhashable_cols, 718 ) 719 from meerschaum.utils.dtypes import ( 720 to_pandas_dtype, 721 none_if_null, 722 to_datetime, 723 are_dtypes_equal, 724 value_is_null, 725 round_time, 726 ) 727 from meerschaum.config import get_config 728 pd = import_pandas() 729 pandas = attempt_import('pandas') 730 if enforce_dtypes or 'dataframe' not in str(type(df)).lower(): 731 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 732 is_dask = hasattr('df', '__module__') and 'dask' in df.__module__ 733 if is_dask: 734 dd = attempt_import('dask.dataframe') 735 merge = dd.merge 736 NA = pandas.NA 737 else: 738 merge = pd.merge 739 NA = pd.NA 740 741 parameters = self.parameters 742 pipe_columns = self.columns 743 primary_key = pipe_columns.get('primary', None) 744 dt_col = pipe_columns.get('datetime', None) 745 dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None 746 autoincrement = parameters.get('autoincrement', False) 747 autotime = parameters.get('autotime', False) 748 749 if primary_key and autoincrement and df is not None and primary_key in df.columns: 750 if safe_copy: 751 df = df.copy() 752 safe_copy = False 753 if df[primary_key].isnull().all(): 754 del df[primary_key] 755 _ = self.columns.pop(primary_key, None) 756 757 if dt_col and autotime and df is not None and dt_col in df.columns: 758 if safe_copy: 759 df = df.copy() 760 safe_copy = False 761 if df[dt_col].isnull().all(): 762 del df[dt_col] 763 _ = self.columns.pop(dt_col, None) 764 765 def get_empty_df(): 766 empty_df = pd.DataFrame([]) 767 dtypes = dict(df.dtypes) if df is not None else {} 768 dtypes.update(self.dtypes) if self.enforce else {} 769 pd_dtypes = { 770 col: to_pandas_dtype(str(typ)) 771 for col, typ in dtypes.items() 772 } 773 return add_missing_cols_to_df(empty_df, pd_dtypes) 774 775 if df is None: 776 empty_df = get_empty_df() 777 return empty_df, empty_df, empty_df 778 779 if (df.empty if not is_dask else len(df) == 0): 780 return df, df, df 781 782 ### begin is the oldest data in the new dataframe 783 begin, end = None, None 784 785 if autoincrement and primary_key == dt_col and dt_col not in df.columns: 786 if enforce_dtypes: 787 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 788 return df, get_empty_df(), df 789 790 if autotime and dt_col and dt_col not in df.columns: 791 if enforce_dtypes: 792 df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug) 793 return df, get_empty_df(), df 794 795 try: 796 min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None 797 if is_dask and min_dt_val is not None: 798 min_dt_val = min_dt_val.compute() 799 min_dt = ( 800 to_datetime(min_dt_val, as_pydatetime=True) 801 if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime') 802 else min_dt_val 803 ) 804 except Exception: 805 min_dt = None 806 807 if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt): 808 if not are_dtypes_equal('int', str(type(min_dt))): 809 min_dt = None 810 811 if isinstance(min_dt, datetime): 812 rounded_min_dt = round_time(min_dt, to='down') 813 try: 814 begin = rounded_min_dt - timedelta(minutes=1) 815 except OverflowError: 816 begin = rounded_min_dt 817 elif dt_type and 'int' in dt_type.lower(): 818 begin = min_dt 819 elif dt_col is None: 820 begin = None 821 822 ### end is the newest data in the new dataframe 823 try: 824 max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None 825 if is_dask and max_dt_val is not None: 826 max_dt_val = max_dt_val.compute() 827 max_dt = ( 828 to_datetime(max_dt_val, as_pydatetime=True) 829 if max_dt_val is not None and 'datetime' in str(dt_type) 830 else max_dt_val 831 ) 832 except Exception: 833 import traceback 834 traceback.print_exc() 835 max_dt = None 836 837 if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt): 838 if not are_dtypes_equal('int', str(type(max_dt))): 839 max_dt = None 840 841 if isinstance(max_dt, datetime): 842 end = ( 843 round_time( 844 max_dt, 845 to='down' 846 ) + timedelta(minutes=1) 847 ) 848 elif dt_type and 'int' in dt_type.lower() and max_dt is not None: 849 end = max_dt + 1 850 851 if max_dt is not None and min_dt is not None and min_dt > max_dt: 852 warn("Detected minimum datetime greater than maximum datetime.") 853 854 if begin is not None and end is not None and begin > end: 855 if isinstance(begin, datetime): 856 begin = end - timedelta(minutes=1) 857 ### We might be using integers for the datetime axis. 858 else: 859 begin = end - 1 860 861 unique_index_vals = { 862 col: df[col].unique() 863 for col in (pipe_columns.values() if not primary_key else [primary_key]) 864 if col in df.columns and col != dt_col 865 } if not date_bound_only else {} 866 unique_index_lens = { 867 col: len(unique_vals) 868 for col, unique_vals in unique_index_vals.items() 869 } if not date_bound_only else {} 870 filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit') 871 _ = kw.pop('params', None) 872 params = { 873 col: [ 874 none_if_null(val) 875 for val in unique_vals 876 ] 877 for col, unique_vals in unique_index_vals.items() 878 if unique_index_lens[col] <= filter_params_index_limit 879 } if not date_bound_only else {} 880 881 if debug: 882 dprint( 883 ( 884 f"Looking at data between '{begin}' and '{end}' with index value lengths:\n" 885 f"{json.dumps(unique_index_lens, indent=4)}\n" 886 ), 887 **kw 888 ) 889 890 backtrack_df = self.get_data( 891 begin=begin, 892 end=end, 893 chunksize=chunksize, 894 params=params, 895 debug=debug, 896 **kw 897 ) 898 if backtrack_df is None: 899 if debug: 900 dprint(f"No backtrack data was found for {self}.") 901 return df, get_empty_df(), df 902 903 if enforce_dtypes: 904 backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug) 905 906 if debug: 907 dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw) 908 dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes)) 909 910 ### Separate new rows from changed ones. 911 on_cols = [ 912 col 913 for col_key, col in pipe_columns.items() 914 if ( 915 col 916 and 917 col_key != 'value' 918 and col in backtrack_df.columns 919 ) 920 ] if not primary_key else [primary_key] 921 922 self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {} 923 on_cols_dtypes = { 924 col: to_pandas_dtype(typ) 925 for col, typ in self_dtypes.items() 926 if col in on_cols 927 } 928 929 ### Detect changes between the old target and new source dataframes. 930 delta_df = add_missing_cols_to_df( 931 filter_unseen_df( 932 backtrack_df, 933 df, 934 dtypes={ 935 col: to_pandas_dtype(typ) 936 for col, typ in self_dtypes.items() 937 }, 938 safe_copy=safe_copy, 939 coerce_mixed_numerics=(not self.static), 940 debug=debug 941 ), 942 on_cols_dtypes, 943 ) 944 if enforce_dtypes: 945 delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug) 946 947 ### Cast dicts or lists to strings so we can merge. 948 serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str) 949 950 def deserializer(x): 951 return json.loads(x) if isinstance(x, str) else x 952 953 unhashable_delta_cols = get_unhashable_cols(delta_df) 954 unhashable_backtrack_cols = get_unhashable_cols(backtrack_df) 955 for col in unhashable_delta_cols: 956 delta_df[col] = delta_df[col].apply(serializer) 957 for col in unhashable_backtrack_cols: 958 backtrack_df[col] = backtrack_df[col].apply(serializer) 959 casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols) 960 961 joined_df = merge( 962 delta_df.infer_objects().fillna(NA), 963 backtrack_df.infer_objects().fillna(NA), 964 how='left', 965 on=on_cols, 966 indicator=True, 967 suffixes=('', '_old'), 968 ) if on_cols else delta_df 969 for col in casted_cols: 970 if col in joined_df.columns: 971 joined_df[col] = joined_df[col].apply(deserializer) 972 if col in delta_df.columns: 973 delta_df[col] = delta_df[col].apply(deserializer) 974 975 ### Determine which rows are completely new. 976 new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None 977 cols = list(delta_df.columns) 978 979 unseen_df = ( 980 joined_df 981 .where(new_rows_mask) 982 .dropna(how='all')[cols] 983 .reset_index(drop=True) 984 ) if on_cols else delta_df 985 986 ### Rows that have already been inserted but values have changed. 987 update_df = ( 988 joined_df 989 .where(~new_rows_mask) 990 .dropna(how='all')[cols] 991 .reset_index(drop=True) 992 ) if on_cols else get_empty_df() 993 994 if include_unchanged_columns and on_cols: 995 unchanged_backtrack_cols = [ 996 col 997 for col in backtrack_df.columns 998 if col in on_cols or col not in update_df.columns 999 ] 1000 if enforce_dtypes: 1001 update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug) 1002 update_df = merge( 1003 backtrack_df[unchanged_backtrack_cols], 1004 update_df, 1005 how='inner', 1006 on=on_cols, 1007 ) 1008 1009 return unseen_df, update_df, delta_df
Inspect a dataframe and filter out rows which already exist in the pipe.
Parameters
- df ('pd.DataFrame'): The dataframe to inspect and filter.
- safe_copy (bool, default True):
If
True, create a copy before comparing and modifying the dataframes. Setting toFalsemay mutate the DataFrames. Seemeerschaum.utils.dataframe.filter_unseen_df. - date_bound_only (bool, default False):
If
True, only use the datetime index to fetch the sample dataframe. - include_unchanged_columns (bool, default False):
If
True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys. - enforce_dtypes (bool, default False):
If
True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Settingenforce_dtypes=Truemay impact performance. - chunksize (Optional[int], default -1):
The
chunksizeused when fetching existing data. - debug (bool, default False): Verbosity toggle.
Returns
- A tuple of three pandas DataFrames (unseen, update, and delta.):
1034def get_num_workers(self, workers: Optional[int] = None) -> int: 1035 """ 1036 Get the number of workers to use for concurrent syncs. 1037 1038 Parameters 1039 ---------- 1040 The number of workers passed via `--workers`. 1041 1042 Returns 1043 ------- 1044 The number of workers, capped for safety. 1045 """ 1046 is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False) 1047 if not is_thread_safe: 1048 return 1 1049 1050 engine_pool_size = ( 1051 self.instance_connector.engine.pool.size() 1052 if self.instance_connector.type == 'sql' 1053 else None 1054 ) 1055 current_num_threads = threading.active_count() 1056 current_num_connections = ( 1057 self.instance_connector.engine.pool.checkedout() 1058 if engine_pool_size is not None 1059 else current_num_threads 1060 ) 1061 desired_workers = ( 1062 min(workers or engine_pool_size, engine_pool_size) 1063 if engine_pool_size is not None 1064 else workers 1065 ) 1066 if desired_workers is None: 1067 desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1) 1068 1069 return max( 1070 (desired_workers - current_num_connections), 1071 1, 1072 )
Get the number of workers to use for concurrent syncs.
Parameters
- The number of workers passed via
--workers.
Returns
- The number of workers, capped for safety.
19def verify( 20 self, 21 begin: Union[datetime, int, None] = None, 22 end: Union[datetime, int, None] = None, 23 params: Optional[Dict[str, Any]] = None, 24 chunk_interval: Union[timedelta, int, None] = None, 25 bounded: Optional[bool] = None, 26 deduplicate: bool = False, 27 workers: Optional[int] = None, 28 batchsize: Optional[int] = None, 29 skip_chunks_with_greater_rowcounts: bool = False, 30 check_rowcounts_only: bool = False, 31 debug: bool = False, 32 **kwargs: Any 33) -> SuccessTuple: 34 """ 35 Verify the contents of the pipe by resyncing its interval. 36 37 Parameters 38 ---------- 39 begin: Union[datetime, int, None], default None 40 If specified, only verify rows greater than or equal to this value. 41 42 end: Union[datetime, int, None], default None 43 If specified, only verify rows less than this value. 44 45 chunk_interval: Union[timedelta, int, None], default None 46 If provided, use this as the size of the chunk boundaries. 47 Default to the value set in `pipe.parameters['verify']['chunk_minutes']` (43200 — 30 days). 48 49 bounded: Optional[bool], default None 50 If `True`, do not verify older than the oldest sync time or newer than the newest. 51 If `False`, verify unbounded syncs outside of the new and old sync times. 52 The default behavior (`None`) is to bound only if a bound interval is set 53 (e.g. `pipe.parameters['verify']['bound_days']`). 54 55 deduplicate: bool, default False 56 If `True`, deduplicate the pipe's table after the verification syncs. 57 58 workers: Optional[int], default None 59 If provided, limit the verification to this many threads. 60 Use a value of `1` to sync chunks in series. 61 62 batchsize: Optional[int], default None 63 If provided, sync this many chunks in parallel. 64 Defaults to `Pipe.get_num_workers()`. 65 66 skip_chunks_with_greater_rowcounts: bool, default False 67 If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's 68 chunk rowcount equals or exceeds the remote's rowcount. 69 70 check_rowcounts_only: bool, default False 71 If `True`, only compare rowcounts and print chunks which are out-of-sync. 72 73 debug: bool, default False 74 Verbosity toggle. 75 76 kwargs: Any 77 All keyword arguments are passed to `pipe.sync()`. 78 79 Returns 80 ------- 81 A SuccessTuple indicating whether the pipe was successfully resynced. 82 """ 83 from meerschaum.utils.pool import get_pool 84 from meerschaum.utils.formatting import make_header 85 from meerschaum.utils.misc import interval_str 86 workers = self.get_num_workers(workers) 87 check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only 88 89 ### Skip configured bounding in parameters 90 ### if `bounded` is explicitly `False`. 91 bound_time = ( 92 self.get_bound_time(debug=debug) 93 if bounded is not False 94 else None 95 ) 96 if bounded is None: 97 bounded = bound_time is not None 98 99 if bounded and begin is None: 100 begin = ( 101 bound_time 102 if bound_time is not None 103 else self.get_sync_time(newest=False, debug=debug) 104 ) 105 if begin is None: 106 remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug) 107 begin = remote_oldest_sync_time 108 if bounded and end is None: 109 end = self.get_sync_time(newest=True, debug=debug) 110 if end is None: 111 remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug) 112 end = remote_newest_sync_time 113 if end is not None: 114 end += ( 115 timedelta(minutes=1) 116 if hasattr(end, 'tzinfo') 117 else 1 118 ) 119 120 begin, end = self.parse_date_bounds(begin, end) 121 cannot_determine_bounds = bounded and begin is None and end is None 122 123 if cannot_determine_bounds and not check_rowcounts_only: 124 warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False) 125 sync_success, sync_msg = self.sync( 126 begin=begin, 127 end=end, 128 params=params, 129 workers=workers, 130 debug=debug, 131 **kwargs 132 ) 133 if not sync_success: 134 return sync_success, sync_msg 135 136 if deduplicate: 137 return self.deduplicate( 138 begin=begin, 139 end=end, 140 params=params, 141 workers=workers, 142 debug=debug, 143 **kwargs 144 ) 145 return sync_success, sync_msg 146 147 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 148 chunk_bounds = self.get_chunk_bounds( 149 begin=begin, 150 end=end, 151 chunk_interval=chunk_interval, 152 bounded=bounded, 153 align=True, 154 debug=debug, 155 ) 156 157 ### Consider it a success if no chunks need to be verified. 158 if not chunk_bounds: 159 if deduplicate: 160 return self.deduplicate( 161 begin=begin, 162 end=end, 163 params=params, 164 workers=workers, 165 debug=debug, 166 **kwargs 167 ) 168 return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do." 169 170 begin_to_print = ( 171 begin 172 if begin is not None 173 else ( 174 chunk_bounds[0][0] 175 if bounded 176 else chunk_bounds[0][1] 177 ) 178 ) 179 end_to_print = ( 180 end 181 if end is not None 182 else ( 183 chunk_bounds[-1][1] 184 if bounded 185 else chunk_bounds[-1][0] 186 ) 187 ) 188 message_header = f"{begin_to_print} - {end_to_print}" 189 max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs') 190 191 info( 192 f"Verifying {self}:\n " 193 + ("Syncing" if not check_rowcounts_only else "Checking") 194 + f" {len(chunk_bounds)} chunk" 195 + ('s' if len(chunk_bounds) != 1 else '') 196 + f" ({'un' if not bounded else ''}bounded)" 197 + f" of size '{interval_str(chunk_interval)}'" 198 + f" between '{begin_to_print}' and '{end_to_print}'.\n" 199 ) 200 201 ### Dictionary of the form bounds -> success_tuple, e.g.: 202 ### { 203 ### (2023-01-01, 2023-01-02): (True, "Success") 204 ### } 205 bounds_success_tuples = {} 206 def process_chunk_bounds( 207 chunk_begin_and_end: Tuple[ 208 Union[int, datetime], 209 Union[int, datetime] 210 ], 211 _workers: Optional[int] = 1, 212 ): 213 if chunk_begin_and_end in bounds_success_tuples: 214 return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end] 215 216 chunk_begin, chunk_end = chunk_begin_and_end 217 do_sync = True 218 chunk_success, chunk_msg = False, "Did not sync chunk." 219 if check_rowcounts: 220 existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug) 221 remote_rowcount = self.get_rowcount( 222 begin=chunk_begin, 223 end=chunk_end, 224 remote=True, 225 debug=debug, 226 ) 227 checked_rows_str = ( 228 f"checked {existing_rowcount:,} row" 229 + ("s" if existing_rowcount != 1 else '') 230 + f" vs {remote_rowcount:,} remote" 231 ) 232 if ( 233 existing_rowcount is not None 234 and remote_rowcount is not None 235 and existing_rowcount >= remote_rowcount 236 ): 237 do_sync = False 238 chunk_success, chunk_msg = True, ( 239 "Row-count is up-to-date " 240 f"({checked_rows_str})." 241 ) 242 elif check_rowcounts_only: 243 do_sync = False 244 chunk_success, chunk_msg = True, ( 245 f"Row-counts are out-of-sync ({checked_rows_str})." 246 ) 247 248 num_syncs = 0 249 while num_syncs < max_chunks_syncs: 250 chunk_success, chunk_msg = self.sync( 251 begin=chunk_begin, 252 end=chunk_end, 253 params=params, 254 workers=_workers, 255 debug=debug, 256 **kwargs 257 ) if do_sync else (chunk_success, chunk_msg) 258 if chunk_success: 259 break 260 num_syncs += 1 261 time.sleep(num_syncs**2) 262 chunk_msg = chunk_msg.strip() 263 if ' - ' not in chunk_msg: 264 chunk_label = f"{chunk_begin} - {chunk_end}" 265 chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}' 266 mrsm.pprint((chunk_success, chunk_msg)) 267 268 return chunk_begin_and_end, (chunk_success, chunk_msg) 269 270 ### If we have more than one chunk, attempt to sync the first one and return if its fails. 271 if len(chunk_bounds) > 1: 272 first_chunk_bounds = chunk_bounds[0] 273 first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}" 274 info(f"Verifying first chunk for {self}:\n {first_label}") 275 ( 276 (first_begin, first_end), 277 (first_success, first_msg) 278 ) = process_chunk_bounds(first_chunk_bounds, _workers=workers) 279 if not first_success: 280 return ( 281 first_success, 282 f"\n{first_label}\n" 283 + f"Failed to sync first chunk:\n{first_msg}" 284 ) 285 bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg) 286 info(f"Completed first chunk for {self}:\n {first_label}\n") 287 chunk_bounds = chunk_bounds[1:] 288 289 pool = get_pool(workers=workers) 290 batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers) 291 292 def process_batch( 293 batch_chunk_bounds: Tuple[ 294 Tuple[Union[datetime, int, None], Union[datetime, int, None]], 295 ... 296 ] 297 ): 298 _batch_begin = batch_chunk_bounds[0][0] 299 _batch_end = batch_chunk_bounds[-1][-1] 300 batch_message_header = f"{_batch_begin} - {_batch_end}" 301 302 if check_rowcounts_only: 303 info(f"Checking row-counts for batch bounds:\n {batch_message_header}") 304 _, (batch_init_success, batch_init_msg) = process_chunk_bounds( 305 (_batch_begin, _batch_end) 306 ) 307 mrsm.pprint((batch_init_success, batch_init_msg)) 308 if batch_init_success and 'up-to-date' in batch_init_msg: 309 info("Entire batch is up-to-date.") 310 return batch_init_success, batch_init_msg 311 312 batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds)) 313 bounds_success_tuples.update(batch_bounds_success_tuples) 314 batch_bounds_success_bools = { 315 bounds: tup[0] 316 for bounds, tup in batch_bounds_success_tuples.items() 317 } 318 319 if all(batch_bounds_success_bools.values()): 320 msg = get_chunks_success_message( 321 batch_bounds_success_tuples, 322 header=batch_message_header, 323 check_rowcounts_only=check_rowcounts_only, 324 ) 325 if deduplicate: 326 deduplicate_success, deduplicate_msg = self.deduplicate( 327 begin=_batch_begin, 328 end=_batch_end, 329 params=params, 330 workers=workers, 331 debug=debug, 332 **kwargs 333 ) 334 return deduplicate_success, msg + '\n\n' + deduplicate_msg 335 return True, msg 336 337 batch_chunk_bounds_to_resync = [ 338 bounds 339 for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools) 340 if not success 341 ] 342 batch_bounds_to_print = [ 343 f"{bounds[0]} - {bounds[1]}" 344 for bounds in batch_chunk_bounds_to_resync 345 ] 346 if batch_bounds_to_print: 347 warn( 348 "Will resync the following failed chunks:\n " 349 + '\n '.join(batch_bounds_to_print), 350 stack=False, 351 ) 352 353 retry_bounds_success_tuples = dict(pool.map( 354 process_chunk_bounds, 355 batch_chunk_bounds_to_resync 356 )) 357 batch_bounds_success_tuples.update(retry_bounds_success_tuples) 358 bounds_success_tuples.update(retry_bounds_success_tuples) 359 retry_bounds_success_bools = { 360 bounds: tup[0] 361 for bounds, tup in retry_bounds_success_tuples.items() 362 } 363 364 if all(retry_bounds_success_bools.values()): 365 chunks_message = ( 366 get_chunks_success_message( 367 batch_bounds_success_tuples, 368 header=batch_message_header, 369 check_rowcounts_only=check_rowcounts_only, 370 ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + ( 371 's' 372 if len(batch_chunk_bounds_to_resync) != 1 373 else '' 374 ) + "." 375 ) 376 if deduplicate: 377 deduplicate_success, deduplicate_msg = self.deduplicate( 378 begin=_batch_begin, 379 end=_batch_end, 380 params=params, 381 workers=workers, 382 debug=debug, 383 **kwargs 384 ) 385 return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg 386 return True, chunks_message 387 388 batch_chunks_message = get_chunks_success_message( 389 batch_bounds_success_tuples, 390 header=batch_message_header, 391 check_rowcounts_only=check_rowcounts_only, 392 ) 393 if deduplicate: 394 deduplicate_success, deduplicate_msg = self.deduplicate( 395 begin=begin, 396 end=end, 397 params=params, 398 workers=workers, 399 debug=debug, 400 **kwargs 401 ) 402 return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg 403 return False, batch_chunks_message 404 405 num_batches = len(batches) 406 for batch_i, batch in enumerate(batches): 407 batch_begin = batch[0][0] 408 batch_end = batch[-1][-1] 409 batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})" 410 batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}" 411 retry_failed_batch = True 412 try: 413 for_self = 'for ' + str(self) 414 batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n ') 415 info(f"Verifying {batch_label_str}\n") 416 batch_success, batch_msg = process_batch(batch) 417 except (KeyboardInterrupt, Exception) as e: 418 batch_success = False 419 batch_msg = str(e) 420 retry_failed_batch = False 421 422 batch_msg_to_print = ( 423 f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}" 424 ) 425 mrsm.pprint((batch_success, batch_msg_to_print)) 426 427 if not batch_success and retry_failed_batch: 428 info(f"Retrying batch {batch_counter_str}...") 429 retry_batch_success, retry_batch_msg = process_batch(batch) 430 retry_batch_msg_to_print = ( 431 f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}" 432 ) 433 mrsm.pprint((retry_batch_success, retry_batch_msg_to_print)) 434 435 batch_success = retry_batch_success 436 batch_msg = retry_batch_msg 437 438 if not batch_success: 439 return False, f"Failed to verify {batch_label}:\n\n{batch_msg}" 440 441 chunks_message = get_chunks_success_message( 442 bounds_success_tuples, 443 header=message_header, 444 check_rowcounts_only=check_rowcounts_only, 445 ) 446 return True, chunks_message
Verify the contents of the pipe by resyncing its interval.
Parameters
- begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
- end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
- chunk_interval (Union[timedelta, int, None], default None):
If provided, use this as the size of the chunk boundaries.
Default to the value set in
pipe.parameters['verify']['chunk_minutes'](43200 — 30 days). - bounded (Optional[bool], default None):
If
True, do not verify older than the oldest sync time or newer than the newest. IfFalse, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g.pipe.parameters['verify']['bound_days']). - deduplicate (bool, default False):
If
True, deduplicate the pipe's table after the verification syncs. - workers (Optional[int], default None):
If provided, limit the verification to this many threads.
Use a value of
1to sync chunks in series. - batchsize (Optional[int], default None):
If provided, sync this many chunks in parallel.
Defaults to
Pipe.get_num_workers(). - skip_chunks_with_greater_rowcounts (bool, default False):
If
True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount. - check_rowcounts_only (bool, default False):
If
True, only compare rowcounts and print chunks which are out-of-sync. - debug (bool, default False): Verbosity toggle.
- kwargs (Any):
All keyword arguments are passed to
pipe.sync().
Returns
- A SuccessTuple indicating whether the pipe was successfully resynced.
547def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]: 548 """ 549 Return the interval used to determine the bound time (limit for verification syncs). 550 If the datetime axis is an integer, just return its value. 551 552 Below are the supported keys for the bound interval: 553 554 - `pipe.parameters['verify']['bound_minutes']` 555 - `pipe.parameters['verify']['bound_hours']` 556 - `pipe.parameters['verify']['bound_days']` 557 - `pipe.parameters['verify']['bound_weeks']` 558 - `pipe.parameters['verify']['bound_years']` 559 - `pipe.parameters['verify']['bound_seconds']` 560 561 If multiple keys are present, the first on this priority list will be used. 562 563 Returns 564 ------- 565 A `timedelta` or `int` value to be used to determine the bound time. 566 """ 567 verify_params = self.parameters.get('verify', {}) 568 prefix = 'bound_' 569 suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds') 570 keys_to_search = { 571 key: val 572 for key, val in verify_params.items() 573 if key.startswith(prefix) 574 } 575 bound_time_key, bound_time_value = None, None 576 for key, value in keys_to_search.items(): 577 for suffix in suffixes_to_check: 578 if key == prefix + suffix: 579 bound_time_key = key 580 bound_time_value = value 581 break 582 if bound_time_key is not None: 583 break 584 585 if bound_time_value is None: 586 return bound_time_value 587 588 dt_col = self.columns.get('datetime', None) 589 if not dt_col: 590 return bound_time_value 591 592 dt_typ = self.dtypes.get(dt_col, 'datetime') 593 if 'int' in dt_typ.lower(): 594 return int(bound_time_value) 595 596 interval_type = bound_time_key.replace(prefix, '') 597 return timedelta(**{interval_type: bound_time_value})
Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.
Below are the supported keys for the bound interval:
- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`
If multiple keys are present, the first on this priority list will be used.
Returns
- A
timedeltaorintvalue to be used to determine the bound time.
600def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]: 601 """ 602 The bound time is the limit at which long-running verification syncs should stop. 603 A value of `None` means verification syncs should be unbounded. 604 605 Like deriving a backtrack time from `pipe.get_sync_time()`, 606 the bound time is the sync time minus a large window (e.g. 366 days). 607 608 Unbound verification syncs (i.e. `bound_time is None`) 609 if the oldest sync time is less than the bound interval. 610 611 Returns 612 ------- 613 A `datetime` or `int` corresponding to the 614 `begin` bound for verification and deduplication syncs. 615 """ 616 bound_interval = self.get_bound_interval(debug=debug) 617 if bound_interval is None: 618 return None 619 620 sync_time = self.get_sync_time(debug=debug) 621 if sync_time is None: 622 return None 623 624 bound_time = sync_time - bound_interval 625 oldest_sync_time = self.get_sync_time(newest=False, debug=debug) 626 max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days'] 627 628 extreme_sync_times_delta = ( 629 hasattr(oldest_sync_time, 'tzinfo') 630 and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days) 631 ) 632 633 return ( 634 bound_time 635 if bound_time > oldest_sync_time or extreme_sync_times_delta 636 else None 637 )
The bound time is the limit at which long-running verification syncs should stop.
A value of None means verification syncs should be unbounded.
Like deriving a backtrack time from pipe.get_sync_time(),
the bound time is the sync time minus a large window (e.g. 366 days).
Unbound verification syncs (i.e. bound_time is None)
if the oldest sync time is less than the bound interval.
Returns
- A
datetimeorintcorresponding to the beginbound for verification and deduplication syncs.
12def delete( 13 self, 14 drop: bool = True, 15 debug: bool = False, 16 **kw 17) -> SuccessTuple: 18 """ 19 Call the Pipe's instance connector's `delete_pipe()` method. 20 21 Parameters 22 ---------- 23 drop: bool, default True 24 If `True`, drop the pipes' target table. 25 26 debug : bool, default False 27 Verbosity toggle. 28 29 Returns 30 ------- 31 A `SuccessTuple` of success (`bool`), message (`str`). 32 33 """ 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.venv import Venv 36 from meerschaum.connectors import get_connector_plugin 37 38 if self.temporary: 39 if self.cache: 40 invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug) 41 if not invalidate_success: 42 return invalidate_success, invalidate_msg 43 44 return ( 45 False, 46 "Cannot delete pipes created with `temporary=True` (read-only). " 47 + "You may want to call `pipe.drop()` instead." 48 ) 49 50 if drop: 51 drop_success, drop_msg = self.drop(debug=debug) 52 if not drop_success: 53 warn(f"Failed to drop {self}:\n{drop_msg}") 54 55 with Venv(get_connector_plugin(self.instance_connector)): 56 result = self.instance_connector.delete_pipe(self, debug=debug, **kw) 57 58 if not isinstance(result, tuple): 59 return False, f"Received an unexpected result from '{self.instance_connector}': {result}" 60 61 if result[0]: 62 self._invalidate_cache(hard=True, debug=debug) 63 self._clear_cache_key('_id', debug=debug) 64 65 return result
Call the Pipe's instance connector's delete_pipe() method.
Parameters
- drop (bool, default True):
If
True, drop the pipes' target table. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool), message (str).
14def drop( 15 self, 16 debug: bool = False, 17 **kw: Any 18) -> SuccessTuple: 19 """ 20 Call the Pipe's instance connector's `drop_pipe()` method. 21 22 Parameters 23 ---------- 24 debug: bool, default False: 25 Verbosity toggle. 26 27 Returns 28 ------- 29 A `SuccessTuple` of success, message. 30 31 """ 32 from meerschaum.utils.venv import Venv 33 from meerschaum.connectors import get_connector_plugin 34 35 self._clear_cache_key('_exists', debug=debug) 36 37 with Venv(get_connector_plugin(self.instance_connector)): 38 if hasattr(self.instance_connector, 'drop_pipe'): 39 result = self.instance_connector.drop_pipe(self, debug=debug, **kw) 40 else: 41 result = ( 42 False, 43 ( 44 "Cannot drop pipes for instance connectors of type " 45 f"'{self.instance_connector.type}'." 46 ) 47 ) 48 49 self._clear_cache_key('_exists', debug=debug) 50 self._clear_cache_key('_exists_timestamp', debug=debug) 51 52 return result
Call the Pipe's instance connector's drop_pipe() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
55def drop_indices( 56 self, 57 columns: Optional[List[str]] = None, 58 debug: bool = False, 59 **kw: Any 60) -> SuccessTuple: 61 """ 62 Call the Pipe's instance connector's `drop_indices()` method. 63 64 Parameters 65 ---------- 66 columns: Optional[List[str]] = None 67 If provided, only drop indices in the given list. 68 69 debug: bool, default False: 70 Verbosity toggle. 71 72 Returns 73 ------- 74 A `SuccessTuple` of success, message. 75 76 """ 77 from meerschaum.utils.venv import Venv 78 from meerschaum.connectors import get_connector_plugin 79 80 self._clear_cache_key('_columns_indices', debug=debug) 81 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 82 self._clear_cache_key('_columns_types', debug=debug) 83 self._clear_cache_key('_columns_types_timestamp', debug=debug) 84 85 with Venv(get_connector_plugin(self.instance_connector)): 86 if hasattr(self.instance_connector, 'drop_pipe_indices'): 87 result = self.instance_connector.drop_pipe_indices( 88 self, 89 columns=columns, 90 debug=debug, 91 **kw 92 ) 93 else: 94 result = ( 95 False, 96 ( 97 "Cannot drop indices for instance connectors of type " 98 f"'{self.instance_connector.type}'." 99 ) 100 ) 101 102 self._clear_cache_key('_columns_indices', debug=debug) 103 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 104 self._clear_cache_key('_columns_types', debug=debug) 105 self._clear_cache_key('_columns_types_timestamp', debug=debug) 106 107 return result
Call the Pipe's instance connector's drop_indices() method.
Parameters
- columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
14def compress( 15 self, 16 debug: bool = False, 17 **kw: Any 18) -> SuccessTuple: 19 """ 20 Call the Pipe's instance connector's `compress_pipe()` method. 21 22 For TimescaleDB hypertables this enables and applies native compression. 23 Other flavors fall back to their respective compression mechanisms where supported. 24 25 Parameters 26 ---------- 27 debug: bool, default False: 28 Verbosity toggle. 29 30 Returns 31 ------- 32 A `SuccessTuple` of success, message. 33 """ 34 from meerschaum.utils.venv import Venv 35 from meerschaum.connectors import get_connector_plugin 36 37 try: 38 with Venv(get_connector_plugin(self.instance_connector)): 39 if hasattr(self.instance_connector, 'compress_pipe'): 40 result = self.instance_connector.compress_pipe(self, debug=debug, **kw) 41 else: 42 result = ( 43 False, 44 ( 45 "Cannot compress pipes for instance connectors of type " 46 f"'{self.instance_connector.type}'." 47 ) 48 ) 49 except NotImplementedError: 50 result = ( 51 False, 52 ( 53 "Compression is not implemented for instance connectors of type " 54 f"'{self.instance_connector.type}'." 55 ) 56 ) 57 58 self._clear_cache_key('_exists', debug=debug) 59 return result
Call the Pipe's instance connector's compress_pipe() method.
For TimescaleDB hypertables this enables and applies native compression. Other flavors fall back to their respective compression mechanisms where supported.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
62def decompress( 63 self, 64 debug: bool = False, 65 **kw: Any 66) -> SuccessTuple: 67 """ 68 Call the Pipe's instance connector's `decompress_pipe()` method, the inverse of `compress()`. 69 70 For TimescaleDB hypertables this removes the compression policy and converts compressed 71 chunks back to row-store. Other flavors fall back to their respective mechanisms where 72 supported. 73 74 Parameters 75 ---------- 76 debug: bool, default False: 77 Verbosity toggle. 78 79 Returns 80 ------- 81 A `SuccessTuple` of success, message. 82 """ 83 from meerschaum.utils.venv import Venv 84 from meerschaum.connectors import get_connector_plugin 85 86 try: 87 with Venv(get_connector_plugin(self.instance_connector)): 88 if hasattr(self.instance_connector, 'decompress_pipe'): 89 result = self.instance_connector.decompress_pipe(self, debug=debug, **kw) 90 else: 91 result = ( 92 False, 93 ( 94 "Cannot decompress pipes for instance connectors of type " 95 f"'{self.instance_connector.type}'." 96 ) 97 ) 98 except NotImplementedError: 99 result = ( 100 False, 101 ( 102 "Decompression is not implemented for instance connectors of type " 103 f"'{self.instance_connector.type}'." 104 ) 105 ) 106 107 self._clear_cache_key('_exists', debug=debug) 108 return result
Call the Pipe's instance connector's decompress_pipe() method, the inverse of compress().
For TimescaleDB hypertables this removes the compression policy and converts compressed chunks back to row-store. Other flavors fall back to their respective mechanisms where supported.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
14def vacuum( 15 self, 16 full: bool = False, 17 debug: bool = False, 18 **kw: Any 19) -> SuccessTuple: 20 """ 21 Call the Pipe's instance connector's `vacuum_pipe()` method to reclaim disk space. 22 23 For PostgreSQL-family tables this runs `VACUUM` (optionally `VACUUM FULL`); other flavors 24 fall back to their respective space-reclaiming mechanisms where supported. 25 26 Parameters 27 ---------- 28 full: bool, default False 29 If `True` (PostgreSQL family only), run `VACUUM FULL` to return freed space to the OS. 30 31 debug: bool, default False 32 Verbosity toggle. 33 34 Returns 35 ------- 36 A `SuccessTuple` of success, message. 37 """ 38 from meerschaum.utils.venv import Venv 39 from meerschaum.connectors import get_connector_plugin 40 41 try: 42 with Venv(get_connector_plugin(self.instance_connector)): 43 if hasattr(self.instance_connector, 'vacuum_pipe'): 44 result = self.instance_connector.vacuum_pipe(self, full=full, debug=debug, **kw) 45 else: 46 result = ( 47 False, 48 ( 49 "Cannot vacuum pipes for instance connectors of type " 50 f"'{self.instance_connector.type}'." 51 ) 52 ) 53 except NotImplementedError: 54 result = ( 55 False, 56 ( 57 "Vacuuming is not implemented for instance connectors of type " 58 f"'{self.instance_connector.type}'." 59 ) 60 ) 61 62 self._clear_cache_key('_exists', debug=debug) 63 return result
Call the Pipe's instance connector's vacuum_pipe() method to reclaim disk space.
For PostgreSQL-family tables this runs VACUUM (optionally VACUUM FULL); other flavors
fall back to their respective space-reclaiming mechanisms where supported.
Parameters
- full (bool, default False):
If
True(PostgreSQL family only), runVACUUM FULLto return freed space to the OS. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
66def analyze( 67 self, 68 debug: bool = False, 69 **kw: Any 70) -> SuccessTuple: 71 """ 72 Call the Pipe's instance connector's `analyze_pipe()` method to refresh planner statistics. 73 74 Parameters 75 ---------- 76 debug: bool, default False 77 Verbosity toggle. 78 79 Returns 80 ------- 81 A `SuccessTuple` of success, message. 82 """ 83 from meerschaum.utils.venv import Venv 84 from meerschaum.connectors import get_connector_plugin 85 86 try: 87 with Venv(get_connector_plugin(self.instance_connector)): 88 if hasattr(self.instance_connector, 'analyze_pipe'): 89 result = self.instance_connector.analyze_pipe(self, debug=debug, **kw) 90 else: 91 result = ( 92 False, 93 ( 94 "Cannot analyze pipes for instance connectors of type " 95 f"'{self.instance_connector.type}'." 96 ) 97 ) 98 except NotImplementedError: 99 result = ( 100 False, 101 ( 102 "Analyzing is not implemented for instance connectors of type " 103 f"'{self.instance_connector.type}'." 104 ) 105 ) 106 107 return result
Call the Pipe's instance connector's analyze_pipe() method to refresh planner statistics.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
110def repartition( 111 self, 112 chunk_minutes: Optional[int] = None, 113 debug: bool = False, 114 **kw: Any 115) -> SuccessTuple: 116 """ 117 Call the Pipe's instance connector's `partition_pipe()` method to rebuild the target table 118 to a new partition (chunk) width. 119 120 On TimescaleDB this changes the chunk interval for future chunks. On PostgreSQL / PostGIS, 121 MySQL / MariaDB, and MSSQL it rebuilds the natively range-partitioned table at the new width. 122 123 Parameters 124 ---------- 125 chunk_minutes: Optional[int], default None 126 The new partition width in minutes. Defaults to the pipe's `verify.chunk_minutes`. 127 128 debug: bool, default False 129 Verbosity toggle. 130 131 Returns 132 ------- 133 A `SuccessTuple` of success, message. 134 """ 135 from meerschaum.utils.venv import Venv 136 from meerschaum.connectors import get_connector_plugin 137 138 try: 139 with Venv(get_connector_plugin(self.instance_connector)): 140 if hasattr(self.instance_connector, 'partition_pipe'): 141 result = self.instance_connector.partition_pipe( 142 self, chunk_minutes=chunk_minutes, debug=debug, **kw 143 ) 144 else: 145 result = ( 146 False, 147 ( 148 "Cannot repartition pipes for instance connectors of type " 149 f"'{self.instance_connector.type}'." 150 ) 151 ) 152 except NotImplementedError: 153 result = ( 154 False, 155 ( 156 "Repartitioning is not implemented for instance connectors of type " 157 f"'{self.instance_connector.type}'." 158 ) 159 ) 160 161 self._clear_cache_key('_exists', debug=debug) 162 return result
Call the Pipe's instance connector's partition_pipe() method to rebuild the target table
to a new partition (chunk) width.
On TimescaleDB this changes the chunk interval for future chunks. On PostgreSQL / PostGIS, MySQL / MariaDB, and MSSQL it rebuilds the natively range-partitioned table at the new width.
Parameters
- chunk_minutes (Optional[int], default None):
The new partition width in minutes. Defaults to the pipe's
verify.chunk_minutes. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
14def create_indices( 15 self, 16 columns: Optional[List[str]] = None, 17 debug: bool = False, 18 **kw: Any 19) -> SuccessTuple: 20 """ 21 Call the Pipe's instance connector's `create_pipe_indices()` method. 22 23 Parameters 24 ---------- 25 debug: bool, default False: 26 Verbosity toggle. 27 28 Returns 29 ------- 30 A `SuccessTuple` of success, message. 31 32 """ 33 from meerschaum.utils.venv import Venv 34 from meerschaum.connectors import get_connector_plugin 35 36 self._clear_cache_key('_columns_indices', debug=debug) 37 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 38 self._clear_cache_key('_columns_types', debug=debug) 39 self._clear_cache_key('_columns_types_timestamp', debug=debug) 40 41 with Venv(get_connector_plugin(self.instance_connector)): 42 if hasattr(self.instance_connector, 'create_pipe_indices'): 43 result = self.instance_connector.create_pipe_indices( 44 self, 45 columns=columns, 46 debug=debug, 47 **kw 48 ) 49 else: 50 result = ( 51 False, 52 ( 53 "Cannot create indices for instance connectors of type " 54 f"'{self.instance_connector.type}'." 55 ) 56 ) 57 58 self._clear_cache_key('_columns_indices', debug=debug) 59 self._clear_cache_key('_columns_indices_timestamp', debug=debug) 60 self._clear_cache_key('_columns_types', debug=debug) 61 self._clear_cache_key('_columns_types_timestamp', debug=debug) 62 63 return result
Call the Pipe's instance connector's create_pipe_indices() method.
Parameters
- debug (bool, default False:): Verbosity toggle.
Returns
- A
SuccessTupleof success, message.
16def clear( 17 self, 18 begin: Optional[datetime] = None, 19 end: Optional[datetime] = None, 20 params: Optional[Dict[str, Any]] = None, 21 debug: bool = False, 22 **kwargs: Any 23) -> SuccessTuple: 24 """ 25 Call the Pipe's instance connector's `clear_pipe` method. 26 27 Parameters 28 ---------- 29 begin: Optional[datetime], default None: 30 If provided, only remove rows newer than this datetime value. 31 32 end: Optional[datetime], default None: 33 If provided, only remove rows older than this datetime column (not including end). 34 35 params: Optional[Dict[str, Any]], default None 36 See `meerschaum.utils.sql.build_where`. 37 38 debug: bool, default False: 39 Verbositity toggle. 40 41 Returns 42 ------- 43 A `SuccessTuple` corresponding to whether this procedure completed successfully. 44 45 Examples 46 -------- 47 >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local') 48 >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]}) 49 >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]}) 50 >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]}) 51 >>> 52 >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0)) 53 >>> pipe.get_data() 54 dt 55 0 2020-01-01 56 57 """ 58 from meerschaum.utils.warnings import warn 59 from meerschaum.utils.venv import Venv 60 from meerschaum.connectors import get_connector_plugin 61 62 begin, end = self.parse_date_bounds(begin, end) 63 64 with Venv(get_connector_plugin(self.instance_connector)): 65 return self.instance_connector.clear_pipe( 66 self, 67 begin=begin, 68 end=end, 69 params=params, 70 debug=debug, 71 **kwargs 72 )
Call the Pipe's instance connector's clear_pipe method.
Parameters
- begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
- end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
See
meerschaum.utils.sql.build_where. - debug (bool, default False:): Verbositity toggle.
Returns
- A
SuccessTuplecorresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>>
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
dt
0 2020-01-01
15def deduplicate( 16 self, 17 begin: Union[datetime, int, None] = None, 18 end: Union[datetime, int, None] = None, 19 params: Optional[Dict[str, Any]] = None, 20 chunk_interval: Union[datetime, int, None] = None, 21 bounded: Optional[bool] = None, 22 workers: Optional[int] = None, 23 debug: bool = False, 24 _use_instance_method: bool = True, 25 **kwargs: Any 26) -> SuccessTuple: 27 """ 28 Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows. 29 30 Parameters 31 ---------- 32 begin: Union[datetime, int, None], default None: 33 If provided, only deduplicate rows newer than this datetime value. 34 35 end: Union[datetime, int, None], default None: 36 If provided, only deduplicate rows older than this datetime column (not including end). 37 38 params: Optional[Dict[str, Any]], default None 39 Restrict deduplication to this filter (for multiplexed data streams). 40 See `meerschaum.utils.sql.build_where`. 41 42 chunk_interval: Union[timedelta, int, None], default None 43 If provided, use this for the chunk bounds. 44 Defaults to the value set in `pipe.parameters['verify']['chunk_minutes']` (43200 — 30 days). 45 46 bounded: Optional[bool], default None 47 Only check outside the oldest and newest sync times if bounded is explicitly `False`. 48 49 workers: Optional[int], default None 50 If the instance connector is thread-safe, limit concurrenct syncs to this many threads. 51 52 debug: bool, default False: 53 Verbositity toggle. 54 55 kwargs: Any 56 All other keyword arguments are passed to 57 `pipe.sync()`, `pipe.clear()`, and `pipe.get_data(). 58 59 Returns 60 ------- 61 A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated. 62 """ 63 from meerschaum.utils.warnings import warn, info 64 from meerschaum.utils.misc import interval_str, items_str 65 from meerschaum.utils.venv import Venv 66 from meerschaum.connectors import get_connector_plugin 67 from meerschaum.utils.pool import get_pool 68 69 begin, end = self.parse_date_bounds(begin, end) 70 71 workers = self.get_num_workers(workers=workers) 72 pool = get_pool(workers=workers) 73 74 if _use_instance_method: 75 with Venv(get_connector_plugin(self.instance_connector)): 76 if hasattr(self.instance_connector, 'deduplicate_pipe'): 77 return self.instance_connector.deduplicate_pipe( 78 self, 79 begin=begin, 80 end=end, 81 params=params, 82 bounded=bounded, 83 debug=debug, 84 **kwargs 85 ) 86 87 ### Only unbound if explicitly False. 88 if bounded is None: 89 bounded = True 90 chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug) 91 92 bound_time = self.get_bound_time(debug=debug) 93 if bounded and begin is None: 94 begin = ( 95 bound_time 96 if bound_time is not None 97 else self.get_sync_time(newest=False, debug=debug) 98 ) 99 if bounded and end is None: 100 end = self.get_sync_time(newest=True, debug=debug) 101 if end is not None: 102 end += ( 103 timedelta(minutes=1) 104 if hasattr(end, 'tzinfo') 105 else 1 106 ) 107 108 chunk_bounds = self.get_chunk_bounds( 109 bounded=bounded, 110 begin=begin, 111 end=end, 112 chunk_interval=chunk_interval, 113 debug=debug, 114 ) 115 116 indices = [col for col in self.columns.values() if col] 117 if not indices: 118 return False, "Cannot deduplicate without index columns." 119 120 def process_chunk_bounds(bounds) -> Tuple[ 121 Tuple[ 122 Union[datetime, int, None], 123 Union[datetime, int, None] 124 ], 125 SuccessTuple 126 ]: 127 ### Only selecting the index values here to keep bandwidth down. 128 chunk_begin, chunk_end = bounds 129 chunk_df = self.get_data( 130 select_columns=indices, 131 begin=chunk_begin, 132 end=chunk_end, 133 params=params, 134 debug=debug, 135 ) 136 if chunk_df is None: 137 return bounds, (True, "") 138 existing_chunk_len = len(chunk_df) 139 deduped_chunk_df = chunk_df.drop_duplicates(keep='last') 140 deduped_chunk_len = len(deduped_chunk_df) 141 142 if existing_chunk_len == deduped_chunk_len: 143 return bounds, (True, "") 144 145 chunk_msg_header = f"\n{chunk_begin} - {chunk_end}" 146 chunk_msg_body = "" 147 148 full_chunk = self.get_data( 149 begin=chunk_begin, 150 end=chunk_end, 151 params=params, 152 debug=debug, 153 ) 154 if full_chunk is None or len(full_chunk) == 0: 155 return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...") 156 157 chunk_indices = [ix for ix in indices if ix in full_chunk.columns] 158 if not chunk_indices: 159 return bounds, (False, f"None of {items_str(indices)} were present in chunk.") 160 try: 161 full_chunk = full_chunk.drop_duplicates( 162 subset=chunk_indices, 163 keep='last' 164 ).reset_index( 165 drop=True, 166 ) 167 except Exception as e: 168 return ( 169 bounds, 170 (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})") 171 ) 172 173 clear_success, clear_msg = self.clear( 174 begin=chunk_begin, 175 end=chunk_end, 176 params=params, 177 debug=debug, 178 ) 179 if not clear_success: 180 chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n" 181 warn(chunk_msg_body) 182 183 sync_success, sync_msg = self.sync(full_chunk, debug=debug) 184 if not sync_success: 185 chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n" 186 187 ### Finally check if the deduplication worked. 188 chunk_rowcount = self.get_rowcount( 189 begin=chunk_begin, 190 end=chunk_end, 191 params=params, 192 debug=debug, 193 ) 194 if chunk_rowcount != deduped_chunk_len: 195 return bounds, ( 196 False, ( 197 chunk_msg_header + "\n" 198 + chunk_msg_body + ("\n" if chunk_msg_body else '') 199 + "Chunk rowcounts still differ (" 200 + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)." 201 ) 202 ) 203 204 return bounds, ( 205 True, ( 206 chunk_msg_header + "\n" 207 + chunk_msg_body + ("\n" if chunk_msg_body else '') 208 + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows." 209 ) 210 ) 211 212 info( 213 f"Deduplicating {len(chunk_bounds)} chunk" 214 + ('s' if len(chunk_bounds) != 1 else '') 215 + f" ({'un' if not bounded else ''}bounded)" 216 + f" of size '{interval_str(chunk_interval)}'" 217 + f" on {self}." 218 ) 219 bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds)) 220 bounds_successes = { 221 bounds: success_tuple 222 for bounds, success_tuple in bounds_success_tuples.items() 223 if success_tuple[0] 224 } 225 bounds_failures = { 226 bounds: success_tuple 227 for bounds, success_tuple in bounds_success_tuples.items() 228 if not success_tuple[0] 229 } 230 231 ### No need to retry if everything failed. 232 if len(bounds_failures) > 0 and len(bounds_successes) == 0: 233 return ( 234 False, 235 ( 236 f"Failed to deduplicate {len(bounds_failures)} chunk" 237 + ('s' if len(bounds_failures) != 1 else '') 238 + ".\n" 239 + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg]) 240 ) 241 ) 242 243 retry_bounds = [bounds for bounds in bounds_failures] 244 if not retry_bounds: 245 return ( 246 True, 247 ( 248 f"Successfully deduplicated {len(bounds_successes)} chunk" 249 + ('s' if len(bounds_successes) != 1 else '') 250 + ".\n" 251 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 252 ).rstrip('\n') 253 ) 254 255 info(f"Retrying {len(retry_bounds)} chunks for {self}...") 256 retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds)) 257 retry_bounds_successes = { 258 bounds: success_tuple 259 for bounds, success_tuple in retry_bounds_success_tuples.items() 260 if success_tuple[0] 261 } 262 retry_bounds_failures = { 263 bounds: success_tuple 264 for bounds, success_tuple in retry_bounds_success_tuples.items() 265 if not success_tuple[0] 266 } 267 268 bounds_successes.update(retry_bounds_successes) 269 if not retry_bounds_failures: 270 return ( 271 True, 272 ( 273 f"Successfully deduplicated {len(bounds_successes)} chunk" 274 + ('s' if len(bounds_successes) != 1 else '') 275 + f"({len(retry_bounds_successes)} retried):\n" 276 + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg]) 277 ).rstrip('\n') 278 ) 279 280 return ( 281 False, 282 ( 283 f"Failed to deduplicate {len(bounds_failures)} chunk" 284 + ('s' if len(retry_bounds_failures) != 1 else '') 285 + ".\n" 286 + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg]) 287 ).rstrip('\n') 288 )
Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.
Parameters
- begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
- end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
- params (Optional[Dict[str, Any]], default None):
Restrict deduplication to this filter (for multiplexed data streams).
See
meerschaum.utils.sql.build_where. - chunk_interval (Union[timedelta, int, None], default None):
If provided, use this for the chunk bounds.
Defaults to the value set in
pipe.parameters['verify']['chunk_minutes'](43200 — 30 days). - bounded (Optional[bool], default None):
Only check outside the oldest and newest sync times if bounded is explicitly
False. - workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
- debug (bool, default False:): Verbositity toggle.
- kwargs (Any):
All other keyword arguments are passed to
pipe.sync(),pipe.clear(), and `pipe.get_data().
Returns
- A
SuccessTuplecorresponding to whether all of the chunks were successfully deduplicated.
16def bootstrap( 17 self, 18 debug: bool = False, 19 yes: bool = False, 20 force: bool = False, 21 noask: bool = False, 22 shell: bool = False, 23 **kw 24) -> SuccessTuple: 25 """ 26 Prompt the user to create a pipe's requirements all from one method. 27 This method shouldn't be used in any automated scripts because it interactively 28 prompts the user and therefore may hang. 29 30 Parameters 31 ---------- 32 debug: bool, default False: 33 Verbosity toggle. 34 35 yes: bool, default False: 36 Print the questions and automatically agree. 37 38 force: bool, default False: 39 Skip the questions and agree anyway. 40 41 noask: bool, default False: 42 Print the questions but go with the default answer. 43 44 shell: bool, default False: 45 Used to determine if we are in the interactive shell. 46 47 Returns 48 ------- 49 A `SuccessTuple` corresponding to the success of this procedure. 50 51 """ 52 53 from meerschaum.utils.warnings import info 54 from meerschaum.utils.prompt import prompt, yes_no 55 from meerschaum.utils.formatting import pprint 56 from meerschaum.config import get_config 57 from meerschaum.utils.formatting._shell import clear_screen 58 from meerschaum.utils.formatting import print_tuple 59 from meerschaum.actions import actions 60 from meerschaum.utils.venv import Venv 61 from meerschaum.connectors import get_connector_plugin 62 63 _clear = get_config('shell', 'clear_screen', patch=True) 64 65 if self.id is not None: 66 delete_tuple = self.delete(debug=debug) 67 if not delete_tuple[0]: 68 return delete_tuple 69 70 if _clear: 71 clear_screen(debug=debug) 72 73 _parameters = _get_parameters(self, debug=debug) 74 self.parameters = _parameters 75 pprint(self.parameters) 76 try: 77 prompt( 78 f"\n Press [Enter] to register {self} with the above configuration:", 79 icon = False 80 ) 81 except KeyboardInterrupt: 82 return False, f"Aborted bootstrapping {self}." 83 84 with Venv(get_connector_plugin(self.instance_connector)): 85 register_tuple = self.instance_connector.register_pipe(self, debug=debug) 86 87 if not register_tuple[0]: 88 return register_tuple 89 90 if _clear: 91 clear_screen(debug=debug) 92 93 try: 94 if yes_no( 95 f"Would you like to edit the definition for {self}?", 96 yes=yes, 97 noask=noask, 98 default='n', 99 ): 100 edit_tuple = self.edit_definition(debug=debug) 101 if not edit_tuple[0]: 102 return edit_tuple 103 104 if yes_no( 105 f"Would you like to try syncing {self} now?", 106 yes=yes, 107 noask=noask, 108 default='n', 109 ): 110 sync_tuple = actions['sync']( 111 ['pipes'], 112 connector_keys=[self.connector_keys], 113 metric_keys=[self.metric_key], 114 location_keys=[self.location_key], 115 mrsm_instance=str(self.instance_connector), 116 debug=debug, 117 shell=shell, 118 ) 119 if not sync_tuple[0]: 120 return sync_tuple 121 except Exception as e: 122 return False, f"Failed to bootstrap {self}:\n" + str(e) 123 124 print_tuple((True, f"Finished bootstrapping {self}!")) 125 info( 126 "You can edit this pipe later with `edit pipes` " 127 + "or set the definition with `edit pipes definition`.\n" 128 + " To sync data into your pipe, run `sync pipes`." 129 ) 130 131 return True, "Success"
Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.
Parameters
- debug (bool, default False:): Verbosity toggle.
- yes (bool, default False:): Print the questions and automatically agree.
- force (bool, default False:): Skip the questions and agree anyway.
- noask (bool, default False:): Print the questions but go with the default answer.
- shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
- A
SuccessTuplecorresponding to the success of this procedure.
20def enforce_dtypes( 21 self, 22 df: 'pd.DataFrame', 23 chunksize: Optional[int] = -1, 24 enforce: bool = True, 25 safe_copy: bool = True, 26 dtypes: Optional[Dict[str, str]] = None, 27 debug: bool = False, 28) -> 'pd.DataFrame': 29 """ 30 Cast the input dataframe to the pipe's registered data types. 31 If the pipe does not exist and dtypes are not set, return the dataframe. 32 """ 33 import traceback 34 from meerschaum.utils.warnings import warn 35 from meerschaum.utils.debug import dprint 36 from meerschaum.utils.dataframe import ( 37 parse_df_datetimes, 38 enforce_dtypes as _enforce_dtypes, 39 parse_simple_lines, 40 ) 41 from meerschaum.utils.dtypes import are_dtypes_equal 42 from meerschaum.utils.packages import import_pandas 43 pd = import_pandas(debug=debug) 44 if df is None: 45 if debug: 46 dprint( 47 "Received None instead of a DataFrame.\n" 48 + " Skipping dtype enforcement..." 49 ) 50 return df 51 52 if not self.enforce: 53 enforce = False 54 55 explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {} 56 pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes 57 58 try: 59 if isinstance(df, str): 60 if df.strip() and df.strip()[0] not in ('{', '['): 61 df = parse_df_datetimes( 62 parse_simple_lines(df), 63 ignore_cols=[ 64 col 65 for col, dtype in pipe_dtypes.items() 66 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 67 ], 68 ) 69 else: 70 df = parse_df_datetimes( 71 pd.read_json(StringIO(df)), 72 ignore_cols=[ 73 col 74 for col, dtype in pipe_dtypes.items() 75 if (not enforce or not are_dtypes_equal(dtype, 'datetime')) 76 ], 77 ignore_all=(not enforce), 78 strip_timezone=(self.tzinfo is None), 79 chunksize=chunksize, 80 debug=debug, 81 ) 82 elif isinstance(df, (dict, list, tuple)): 83 df = parse_df_datetimes( 84 df, 85 ignore_cols=[ 86 col 87 for col, dtype in pipe_dtypes.items() 88 if (not enforce or not are_dtypes_equal(str(dtype), 'datetime')) 89 ], 90 strip_timezone=(self.tzinfo is None), 91 chunksize=chunksize, 92 debug=debug, 93 ) 94 except Exception as e: 95 warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}") 96 return None 97 98 if not pipe_dtypes: 99 if debug: 100 dprint( 101 f"Could not find dtypes for {self}.\n" 102 + "Skipping dtype enforcement..." 103 ) 104 return df 105 106 return _enforce_dtypes( 107 df, 108 pipe_dtypes, 109 explicit_dtypes=explicit_dtypes, 110 safe_copy=safe_copy, 111 strip_timezone=(self.tzinfo is None), 112 coerce_numeric=self.mixed_numerics, 113 coerce_timezone=enforce, 114 debug=debug, 115 )
Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.
118def infer_dtypes( 119 self, 120 persist: bool = False, 121 refresh: bool = False, 122 debug: bool = False, 123) -> Dict[str, Any]: 124 """ 125 If `dtypes` is not set in `meerschaum.Pipe.parameters`, 126 infer the data types from the underlying table if it exists. 127 128 Parameters 129 ---------- 130 persist: bool, default False 131 If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`. 132 NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only. 133 134 refresh: bool, default False 135 If `True`, retrieve the latest columns-types for the pipe. 136 See `Pipe.get_columns.types()`. 137 138 Returns 139 ------- 140 A dictionary of strings containing the pandas data types for this Pipe. 141 """ 142 if not self.exists(debug=debug): 143 return {} 144 145 from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type 146 from meerschaum.utils.dtypes import to_pandas_dtype 147 148 ### NOTE: get_columns_types() may return either the types as 149 ### PostgreSQL- or Pandas-style. 150 columns_types = self.get_columns_types(refresh=refresh, debug=debug) 151 152 remote_pd_dtypes = { 153 c: ( 154 get_pd_type_from_db_type(t, allow_custom_dtypes=True) 155 if str(t).isupper() 156 else to_pandas_dtype(t) 157 ) 158 for c, t in columns_types.items() 159 } if columns_types else {} 160 if not persist: 161 return remote_pd_dtypes 162 163 parameters = self.get_parameters(refresh=refresh, debug=debug) 164 dtypes = parameters.get('dtypes', {}) 165 dtypes.update({ 166 col: typ 167 for col, typ in remote_pd_dtypes.items() 168 if col not in dtypes 169 }) 170 self.dtypes = dtypes 171 self.edit(interactive=False, debug=debug) 172 return remote_pd_dtypes
If dtypes is not set in meerschaum.Pipe.parameters,
infer the data types from the underlying table if it exists.
Parameters
- persist (bool, default False):
If
True, persist the inferred data types tomeerschaum.Pipe.parameters. NOTE: Use with caution! Generallydtypesis meant to be user-configurable only. - refresh (bool, default False):
If
True, retrieve the latest columns-types for the pipe. SeePipe.get_columns.types().
Returns
- A dictionary of strings containing the pandas data types for this Pipe.
15def copy_to( 16 self, 17 instance_keys: str, 18 sync: bool = True, 19 begin: Union[datetime, int, None] = None, 20 end: Union[datetime, int, None] = None, 21 params: Optional[Dict[str, Any]] = None, 22 chunk_interval: Union[timedelta, int, None] = None, 23 debug: bool = False, 24 **kwargs: Any 25) -> SuccessTuple: 26 """ 27 Copy a pipe to another instance. 28 29 Parameters 30 ---------- 31 instance_keys: str 32 The instance to which to copy this pipe. 33 34 sync: bool, default True 35 If `True`, sync the source pipe's documents 36 37 begin: Union[datetime, int, None], default None 38 Beginning datetime value to pass to `Pipe.get_data()`. 39 40 end: Union[datetime, int, None], default None 41 End datetime value to pass to `Pipe.get_data()`. 42 43 params: Optional[Dict[str, Any]], default None 44 Parameters filter to pass to `Pipe.get_data()`. 45 46 chunk_interval: Union[timedelta, int, None], default None 47 The size of chunks to retrieve from `Pipe.get_data()` for syncing. 48 49 kwargs: Any 50 Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`. 51 52 Returns 53 ------- 54 A SuccessTuple indicating success. 55 """ 56 if str(instance_keys) == self.instance_keys: 57 return False, f"Cannot copy {self} to instance '{instance_keys}'." 58 59 begin, end = self.parse_date_bounds(begin, end) 60 61 new_pipe = mrsm.Pipe( 62 self.connector_keys, 63 self.metric_key, 64 self.location_key, 65 parameters=self.parameters.copy(), 66 instance=instance_keys, 67 ) 68 69 new_pipe_is_registered = new_pipe.id is not None 70 71 metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register 72 metadata_success, metadata_msg = metadata_method(debug=debug) 73 if not metadata_success: 74 return metadata_success, metadata_msg 75 76 if not self.exists(debug=debug): 77 return True, f"{self} does not exist; nothing to sync." 78 79 original_as_iterator = kwargs.get('as_iterator', None) 80 kwargs['as_iterator'] = True 81 82 chunk_generator = self.get_data( 83 begin=begin, 84 end=end, 85 params=params, 86 chunk_interval=chunk_interval, 87 debug=debug, 88 **kwargs 89 ) 90 91 if original_as_iterator is None: 92 _ = kwargs.pop('as_iterator', None) 93 else: 94 kwargs['as_iterator'] = original_as_iterator 95 96 sync_success, sync_msg = new_pipe.sync( 97 chunk_generator, 98 begin=begin, 99 end=end, 100 params=params, 101 debug=debug, 102 **kwargs 103 ) 104 msg = ( 105 f"Successfully synced {new_pipe}:\n{sync_msg}" 106 if sync_success 107 else f"Failed to sync {new_pipe}:\n{sync_msg}" 108 ) 109 return sync_success, msg
Copy a pipe to another instance.
Parameters
- instance_keys (str): The instance to which to copy this pipe.
- sync (bool, default True):
If
True, sync the source pipe's documents - begin (Union[datetime, int, None], default None):
Beginning datetime value to pass to
Pipe.get_data(). - end (Union[datetime, int, None], default None):
End datetime value to pass to
Pipe.get_data(). - params (Optional[Dict[str, Any]], default None):
Parameters filter to pass to
Pipe.get_data(). - chunk_interval (Union[timedelta, int, None], default None):
The size of chunks to retrieve from
Pipe.get_data()for syncing. - kwargs (Any):
Additional flags to pass to
Pipe.get_data()andPipe.sync(), e.g.workers.
Returns
- A SuccessTuple indicating success.
30class Plugin: 31 """Handle packaging of Meerschaum plugins.""" 32 33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 import meerschaum.config.paths as paths 46 from meerschaum._internal.static import STATIC_CONFIG 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else paths.VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo 74 75 76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector 93 94 95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version 106 107 108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module 121 122 123 @property 124 def __file__(self) -> Union[str, None]: 125 """ 126 Return the file path (str) of the plugin if it exists, otherwise `None`. 127 """ 128 if self.__dict__.get('_module', None) is not None: 129 return self.module.__file__ 130 131 import meerschaum.config.paths as paths 132 133 potential_dir = paths.PLUGINS_RESOURCES_PATH / self.name 134 if ( 135 potential_dir.exists() 136 and potential_dir.is_dir() 137 and (potential_dir / '__init__.py').exists() 138 ): 139 return str((potential_dir / '__init__.py').as_posix()) 140 141 potential_file = paths.PLUGINS_RESOURCES_PATH / (self.name + '.py') 142 if potential_file.exists() and not potential_file.is_dir(): 143 return str(potential_file.as_posix()) 144 145 return None 146 147 148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path 159 160 161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None 170 171 172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path 255 256 257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 289 import meerschaum.config.paths as paths 290 from meerschaum.utils.warnings import warn, error 291 if debug: 292 from meerschaum.utils.debug import dprint 293 import tarfile 294 import re 295 import ast 296 from meerschaum.plugins import sync_plugins_symlinks 297 from meerschaum.utils.packages import attempt_import, reload_meerschaum 298 from meerschaum.utils.venv import init_venv 299 from meerschaum.utils.misc import safely_extract_tar 300 old_cwd = os.getcwd() 301 old_version = '' 302 new_version = '' 303 temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name 304 temp_dir.mkdir(exist_ok=True) 305 306 if not self.archive_path.exists(): 307 return False, f"Missing archive file for plugin '{self}'." 308 if self.version is not None: 309 old_version = self.version 310 if debug: 311 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 312 313 if debug: 314 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 315 316 try: 317 with tarfile.open(self.archive_path, 'r:gz') as tarf: 318 safely_extract_tar(tarf, temp_dir) 319 except Exception as e: 320 warn(e) 321 return False, f"Failed to extract plugin '{self.name}'." 322 323 ### search for version information 324 files = os.listdir(temp_dir) 325 326 if str(files[0]) == self.name: 327 is_dir = True 328 elif str(files[0]) == self.name + '.py': 329 is_dir = False 330 else: 331 error(f"Unknown format encountered for plugin '{self}'.") 332 333 fpath = temp_dir / files[0] 334 if is_dir: 335 fpath = fpath / '__init__.py' 336 337 init_venv(self.name, debug=debug) 338 with open(fpath, 'r', encoding='utf-8') as f: 339 init_lines = f.readlines() 340 new_version = None 341 for line in init_lines: 342 if '__version__' not in line: 343 continue 344 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 345 if not version_match: 346 continue 347 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 348 break 349 if not new_version: 350 warn( 351 f"No `__version__` defined for plugin '{self}'. " 352 + "Assuming new version...", 353 stack = False, 354 ) 355 356 packaging_version = attempt_import('packaging.version') 357 try: 358 is_new_version = (not new_version and not old_version) or ( 359 packaging_version.parse(old_version) < packaging_version.parse(new_version) 360 ) 361 is_same_version = new_version and old_version and ( 362 packaging_version.parse(old_version) == packaging_version.parse(new_version) 363 ) 364 except Exception: 365 is_new_version, is_same_version = True, False 366 367 ### Determine where to permanently store the new plugin. 368 plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0] 369 for path in paths.PLUGINS_DIR_PATHS: 370 if not path.exists(): 371 warn(f"Plugins path does not exist: {path}", stack=False) 372 continue 373 374 files_in_plugins_dir = os.listdir(path) 375 if ( 376 self.name in files_in_plugins_dir 377 or 378 (self.name + '.py') in files_in_plugins_dir 379 ): 380 plugin_installation_dir_path = path 381 break 382 383 success_msg = ( 384 f"Successfully installed plugin '{self}'" 385 + ("\n (skipped dependencies)" if skip_deps else "") 386 + "." 387 ) 388 success, abort = None, None 389 390 if is_same_version and not force: 391 success, msg = True, ( 392 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 393 " Install again with `-f` or `--force` to reinstall." 394 ) 395 abort = True 396 elif is_new_version or force: 397 for src_dir, dirs, files in os.walk(temp_dir): 398 if success is not None: 399 break 400 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 401 if not os.path.exists(dst_dir): 402 os.mkdir(dst_dir) 403 for f in files: 404 src_file = os.path.join(src_dir, f) 405 dst_file = os.path.join(dst_dir, f) 406 if os.path.exists(dst_file): 407 os.remove(dst_file) 408 409 if debug: 410 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 411 try: 412 shutil.move(src_file, dst_dir) 413 except Exception: 414 success, msg = False, ( 415 f"Failed to install plugin '{self}': " + 416 f"Could not move file '{src_file}' to '{dst_dir}'" 417 ) 418 print(msg) 419 break 420 if success is None: 421 success, msg = True, success_msg 422 else: 423 success, msg = False, ( 424 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 425 + f"attempted version {new_version}." 426 ) 427 428 shutil.rmtree(temp_dir) 429 os.chdir(old_cwd) 430 431 ### Reload the plugin's module. 432 sync_plugins_symlinks(debug=debug) 433 if '_module' in self.__dict__: 434 del self.__dict__['_module'] 435 init_venv(venv=self.name, force=True, debug=debug) 436 reload_meerschaum(debug=debug) 437 438 ### if we've already failed, return here 439 if not success or abort: 440 _ongoing_installations.remove(self.full_name) 441 return success, msg 442 443 ### attempt to install dependencies 444 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 445 if not dependencies_installed: 446 _ongoing_installations.remove(self.full_name) 447 return False, f"Failed to install dependencies for plugin '{self}'." 448 449 ### handling success tuple, bool, or other (typically None) 450 setup_tuple = self.setup(debug=debug) 451 if isinstance(setup_tuple, tuple): 452 if not setup_tuple[0]: 453 success, msg = setup_tuple 454 elif isinstance(setup_tuple, bool): 455 if not setup_tuple: 456 success, msg = False, ( 457 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 458 f"Check `setup()` in '{self.__file__}' for more information " + 459 "(no error message provided)." 460 ) 461 else: 462 success, msg = True, success_msg 463 elif setup_tuple is None: 464 success = True 465 msg = ( 466 f"Post-install for plugin '{self}' returned None. " + 467 "Assuming plugin successfully installed." 468 ) 469 warn(msg) 470 else: 471 success = False 472 msg = ( 473 f"Post-install for plugin '{self}' returned unexpected value " + 474 f"of type '{type(setup_tuple)}': {setup_tuple}" 475 ) 476 477 _ongoing_installations.remove(self.full_name) 478 _ = self.module 479 return success, msg 480 481 482 def remove_archive( 483 self, 484 debug: bool = False 485 ) -> SuccessTuple: 486 """Remove a plugin's archive file.""" 487 if not self.archive_path.exists(): 488 return True, f"Archive file for plugin '{self}' does not exist." 489 try: 490 self.archive_path.unlink() 491 except Exception as e: 492 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 493 return True, "Success" 494 495 496 def remove_venv( 497 self, 498 debug: bool = False 499 ) -> SuccessTuple: 500 """Remove a plugin's virtual environment.""" 501 if not self.venv_path.exists(): 502 return True, f"Virtual environment for plugin '{self}' does not exist." 503 try: 504 shutil.rmtree(self.venv_path) 505 except Exception as e: 506 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 507 return True, "Success" 508 509 510 def uninstall(self, debug: bool = False) -> SuccessTuple: 511 """ 512 Remove a plugin, its virtual environment, and archive file. 513 """ 514 from meerschaum.utils.packages import reload_meerschaum 515 from meerschaum.plugins import sync_plugins_symlinks 516 from meerschaum.utils.warnings import warn, info 517 warnings_thrown_count: int = 0 518 max_warnings: int = 3 519 520 if not self.is_installed(): 521 info( 522 f"Plugin '{self.name}' doesn't seem to be installed.\n " 523 + "Checking for artifacts...", 524 stack = False, 525 ) 526 else: 527 real_path = pathlib.Path(os.path.realpath(self.__file__)) 528 try: 529 if real_path.name == '__init__.py': 530 shutil.rmtree(real_path.parent) 531 else: 532 real_path.unlink() 533 except Exception as e: 534 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 535 warnings_thrown_count += 1 536 else: 537 info(f"Removed source files for plugin '{self.name}'.") 538 539 if self.venv_path.exists(): 540 success, msg = self.remove_venv(debug=debug) 541 if not success: 542 warn(msg, stack=False) 543 warnings_thrown_count += 1 544 else: 545 info(f"Removed virtual environment from plugin '{self.name}'.") 546 547 success = warnings_thrown_count < max_warnings 548 sync_plugins_symlinks(debug=debug) 549 self.deactivate_venv(force=True, debug=debug) 550 reload_meerschaum(debug=debug) 551 return success, ( 552 f"Successfully uninstalled plugin '{self}'." if success 553 else f"Failed to uninstall plugin '{self}'." 554 ) 555 556 557 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 558 """ 559 If exists, run the plugin's `setup()` function. 560 561 Parameters 562 ---------- 563 *args: str 564 The positional arguments passed to the `setup()` function. 565 566 debug: bool, default False 567 Verbosity toggle. 568 569 **kw: Any 570 The keyword arguments passed to the `setup()` function. 571 572 Returns 573 ------- 574 A `SuccessTuple` or `bool` indicating success. 575 576 """ 577 from meerschaum.utils.debug import dprint 578 import inspect 579 _setup = None 580 for name, fp in inspect.getmembers(self.module): 581 if name == 'setup' and inspect.isfunction(fp): 582 _setup = fp 583 break 584 585 ### assume success if no setup() is found (not necessary) 586 if _setup is None: 587 return True 588 589 sig = inspect.signature(_setup) 590 has_debug, has_kw = ('debug' in sig.parameters), False 591 for k, v in sig.parameters.items(): 592 if '**' in str(v): 593 has_kw = True 594 break 595 596 _kw = {} 597 if has_kw: 598 _kw.update(kw) 599 if has_debug: 600 _kw['debug'] = debug 601 602 if debug: 603 dprint(f"Running setup for plugin '{self}'...") 604 try: 605 self.activate_venv(debug=debug) 606 return_tuple = _setup(*args, **_kw) 607 self.deactivate_venv(debug=debug) 608 except Exception as e: 609 return False, str(e) 610 611 if isinstance(return_tuple, tuple): 612 return return_tuple 613 if isinstance(return_tuple, bool): 614 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 615 if return_tuple is None: 616 return False, f"Setup for Plugin '{self.name}' returned None." 617 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}" 618 619 620 def get_dependencies( 621 self, 622 debug: bool = False, 623 ) -> List[str]: 624 """ 625 If the Plugin has specified dependencies in a list called `required`, return the list. 626 627 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 628 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 629 630 Parameters 631 ---------- 632 debug: bool, default False 633 Verbosity toggle. 634 635 Returns 636 ------- 637 A list of required packages and plugins (str). 638 639 """ 640 if '_required' in self.__dict__: 641 return self._required 642 643 ### If the plugin has not yet been imported, 644 ### infer the dependencies from the source text. 645 ### This is not super robust, and it doesn't feel right 646 ### having multiple versions of the logic. 647 ### This is necessary when determining the activation order 648 ### without having import the module. 649 ### For consistency's sake, the module-less method does not cache the requirements. 650 if self.__dict__.get('_module', None) is None: 651 file_path = self.__file__ 652 if file_path is None: 653 return [] 654 with open(file_path, 'r', encoding='utf-8') as f: 655 text = f.read() 656 657 if 'required' not in text: 658 return [] 659 660 ### This has some limitations: 661 ### It relies on `required` being manually declared. 662 ### We lose the ability to dynamically alter the `required` list, 663 ### which is why we've kept the module-reliant method below. 664 import ast, re 665 ### NOTE: This technically would break 666 ### if `required` was the very first line of the file. 667 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 668 if not req_start_match: 669 return [] 670 req_start = req_start_match.start() 671 equals_sign = req_start + text[req_start:].find('=') 672 673 ### Dependencies may have brackets within the strings, so push back the index. 674 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 675 if first_opening_brace == -1: 676 return [] 677 678 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 679 if next_closing_brace == -1: 680 return [] 681 682 start_ix = first_opening_brace + 1 683 end_ix = next_closing_brace 684 685 num_braces = 0 686 while True: 687 if '[' not in text[start_ix:end_ix]: 688 break 689 num_braces += 1 690 start_ix = end_ix 691 end_ix += text[end_ix + 1:].find(']') + 1 692 693 req_end = end_ix + 1 694 req_text = ( 695 text[(first_opening_brace-1):req_end] 696 .lstrip() 697 .replace('=', '', 1) 698 .lstrip() 699 .rstrip() 700 ) 701 try: 702 required = ast.literal_eval(req_text) 703 except Exception as e: 704 warn( 705 f"Unable to determine requirements for plugin '{self.name}' " 706 + "without importing the module.\n" 707 + " This may be due to dynamically setting the global `required` list.\n" 708 + f" {e}" 709 ) 710 return [] 711 return required 712 713 import inspect 714 self.activate_venv(dependencies=False, debug=debug) 715 required = [] 716 for name, val in inspect.getmembers(self.module): 717 if name == 'required': 718 required = val 719 break 720 self._required = required 721 self.deactivate_venv(dependencies=False, debug=debug) 722 return required 723 724 725 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 726 """ 727 Return a list of required Plugin objects. 728 """ 729 from meerschaum.utils.warnings import warn 730 from meerschaum.config import get_config 731 from meerschaum._internal.static import STATIC_CONFIG 732 from meerschaum.connectors.parse import is_valid_connector_keys 733 plugins = [] 734 _deps = self.get_dependencies(debug=debug) 735 sep = STATIC_CONFIG['plugins']['repo_separator'] 736 plugin_names = [ 737 _d[len('plugin:'):] for _d in _deps 738 if _d.startswith('plugin:') and len(_d) > len('plugin:') 739 ] 740 default_repo_keys = get_config('meerschaum', 'repository') 741 skipped_repo_keys = set() 742 743 for _plugin_name in plugin_names: 744 if sep in _plugin_name: 745 try: 746 _plugin_name, _repo_keys = _plugin_name.split(sep) 747 except Exception: 748 _repo_keys = default_repo_keys 749 warn( 750 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 751 + f"Will try to use '{_repo_keys}' instead.", 752 stack = False, 753 ) 754 else: 755 _repo_keys = default_repo_keys 756 757 if _repo_keys in skipped_repo_keys: 758 continue 759 760 if not is_valid_connector_keys(_repo_keys): 761 warn( 762 f"Invalid connector '{_repo_keys}'.\n" 763 f" Skipping required plugins from repository '{_repo_keys}'", 764 stack=False, 765 ) 766 continue 767 768 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 769 770 return plugins 771 772 773 def get_required_packages(self, debug: bool=False) -> List[str]: 774 """ 775 Return the required package names (excluding plugins). 776 """ 777 _deps = self.get_dependencies(debug=debug) 778 return [_d for _d in _deps if not _d.startswith('plugin:')] 779 780 781 def activate_venv( 782 self, 783 dependencies: bool = True, 784 init_if_not_exists: bool = True, 785 debug: bool = False, 786 **kw 787 ) -> bool: 788 """ 789 Activate the virtual environments for the plugin and its dependencies. 790 791 Parameters 792 ---------- 793 dependencies: bool, default True 794 If `True`, activate the virtual environments for required plugins. 795 796 Returns 797 ------- 798 A bool indicating success. 799 """ 800 import meerschaum.config.paths as paths 801 from meerschaum.utils.venv import venv_target_path 802 from meerschaum.utils.packages import activate_venv 803 from meerschaum.utils.misc import make_symlink, is_symlink 804 805 if dependencies: 806 for plugin in self.get_required_plugins(debug=debug): 807 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 808 809 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 810 venv_meerschaum_path = vtp / 'meerschaum' 811 812 try: 813 success, msg = True, "Success" 814 if is_symlink(venv_meerschaum_path): 815 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH: 816 venv_meerschaum_path.unlink() 817 success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH) 818 except Exception as e: 819 success, msg = False, str(e) 820 if not success: 821 warn( 822 f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n" 823 f"{msg}" 824 ) 825 826 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw) 827 828 829 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 830 """ 831 Deactivate the virtual environments for the plugin and its dependencies. 832 833 Parameters 834 ---------- 835 dependencies: bool, default True 836 If `True`, deactivate the virtual environments for required plugins. 837 838 Returns 839 ------- 840 A bool indicating success. 841 """ 842 from meerschaum.utils.packages import deactivate_venv 843 success = deactivate_venv(self.name, debug=debug, **kw) 844 if dependencies: 845 for plugin in self.get_required_plugins(debug=debug): 846 plugin.deactivate_venv(debug=debug, **kw) 847 return success 848 849 850 def install_dependencies( 851 self, 852 force: bool = False, 853 debug: bool = False, 854 ) -> bool: 855 """ 856 If specified, install dependencies. 857 858 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 859 Meerschaum plugins from the same repository as this Plugin. 860 To install from a different repository, add the repo keys after `'@'` 861 (e.g. `'plugin:foo@api:bar'`). 862 863 Parameters 864 ---------- 865 force: bool, default False 866 If `True`, continue with the installation, even if some 867 required packages fail to install. 868 869 debug: bool, default False 870 Verbosity toggle. 871 872 Returns 873 ------- 874 A bool indicating success. 875 """ 876 from meerschaum.utils.packages import pip_install, venv_contains_package 877 from meerschaum.utils.warnings import warn, info 878 _deps = self.get_dependencies(debug=debug) 879 if not _deps and self.requirements_file_path is None: 880 return True 881 882 plugins = self.get_required_plugins(debug=debug) 883 for _plugin in plugins: 884 if _plugin.name == self.name: 885 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 886 continue 887 _success, _msg = _plugin.repo_connector.install_plugin( 888 _plugin.name, debug=debug, force=force 889 ) 890 if not _success: 891 warn( 892 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 893 + f" for plugin '{self.name}':\n" + _msg, 894 stack = False, 895 ) 896 if not force: 897 warn( 898 "Try installing with the `--force` flag to continue anyway.", 899 stack = False, 900 ) 901 return False 902 info( 903 "Continuing with installation despite the failure " 904 + "(careful, things might be broken!)...", 905 icon = False 906 ) 907 908 909 ### First step: parse `requirements.txt` if it exists. 910 if self.requirements_file_path is not None: 911 if not pip_install( 912 requirements_file_path=self.requirements_file_path, 913 venv=self.name, debug=debug 914 ): 915 warn( 916 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 917 stack = False, 918 ) 919 if not force: 920 warn( 921 "Try installing with `--force` to continue anyway.", 922 stack = False, 923 ) 924 return False 925 info( 926 "Continuing with installation despite the failure " 927 + "(careful, things might be broken!)...", 928 icon = False 929 ) 930 931 932 ### Don't reinstall packages that are already included in required plugins. 933 packages = [] 934 _packages = self.get_required_packages(debug=debug) 935 accounted_for_packages = set() 936 for package_name in _packages: 937 for plugin in plugins: 938 if venv_contains_package(package_name, plugin.name): 939 accounted_for_packages.add(package_name) 940 break 941 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 942 943 ### Attempt pip packages installation. 944 if packages: 945 for package in packages: 946 if not pip_install(package, venv=self.name, debug=debug): 947 warn( 948 f"Failed to install required package '{package}'" 949 + f" for plugin '{self.name}'.", 950 stack = False, 951 ) 952 if not force: 953 warn( 954 "Try installing with `--force` to continue anyway.", 955 stack = False, 956 ) 957 return False 958 info( 959 "Continuing with installation despite the failure " 960 + "(careful, things might be broken!)...", 961 icon = False 962 ) 963 return True 964 965 966 @property 967 def full_name(self) -> str: 968 """ 969 Include the repo keys with the plugin's name. 970 """ 971 from meerschaum._internal.static import STATIC_CONFIG 972 sep = STATIC_CONFIG['plugins']['repo_separator'] 973 return self.name + sep + str(self.repo_connector) 974 975 976 def __str__(self): 977 return self.name 978 979 980 def __repr__(self): 981 return f"Plugin('{self.name}', repo='{self.repo_connector}')" 982 983 984 def __del__(self): 985 pass
Handle packaging of Meerschaum plugins.
33 def __init__( 34 self, 35 name: str, 36 version: Optional[str] = None, 37 user_id: Optional[int] = None, 38 required: Optional[List[str]] = None, 39 attributes: Optional[Dict[str, Any]] = None, 40 archive_path: Optional[pathlib.Path] = None, 41 venv_path: Optional[pathlib.Path] = None, 42 repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None, 43 repo: Union['mrsm.connectors.api.APIConnector', str, None] = None, 44 ): 45 import meerschaum.config.paths as paths 46 from meerschaum._internal.static import STATIC_CONFIG 47 sep = STATIC_CONFIG['plugins']['repo_separator'] 48 _repo = None 49 if sep in name: 50 try: 51 name, _repo = name.split(sep) 52 except Exception as e: 53 error(f"Invalid plugin name: '{name}'") 54 self._repo_in_name = _repo 55 56 if attributes is None: 57 attributes = {} 58 self.name = name 59 self.attributes = attributes 60 self.user_id = user_id 61 self._version = version 62 if required: 63 self._required = required 64 self.archive_path = ( 65 archive_path if archive_path is not None 66 else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz" 67 ) 68 self.venv_path = ( 69 venv_path if venv_path is not None 70 else paths.VIRTENV_RESOURCES_PATH / self.name 71 ) 72 self._repo_connector = repo_connector 73 self._repo_keys = repo
76 @property 77 def repo_connector(self): 78 """ 79 Return the repository connector for this plugin. 80 NOTE: This imports the `connectors` module, which imports certain plugin modules. 81 """ 82 if self._repo_connector is None: 83 from meerschaum.connectors.parse import parse_repo_keys 84 85 repo_keys = self._repo_keys or self._repo_in_name 86 if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name: 87 error( 88 f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'." 89 ) 90 repo_connector = parse_repo_keys(repo_keys) 91 self._repo_connector = repo_connector 92 return self._repo_connector
Return the repository connector for this plugin.
NOTE: This imports the connectors module, which imports certain plugin modules.
95 @property 96 def version(self): 97 """ 98 Return the plugin's module version is defined (`__version__`) if it's defined. 99 """ 100 if self._version is None: 101 try: 102 self._version = self.module.__version__ 103 except Exception as e: 104 self._version = None 105 return self._version
Return the plugin's module version is defined (__version__) if it's defined.
108 @property 109 def module(self): 110 """ 111 Return the Python module of the underlying plugin. 112 """ 113 if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None: 114 if self.__file__ is None: 115 return None 116 117 from meerschaum.plugins import import_plugins 118 self._module = import_plugins(str(self), warn=False) 119 120 return self._module
Return the Python module of the underlying plugin.
148 @property 149 def requirements_file_path(self) -> Union[pathlib.Path, None]: 150 """ 151 If a file named `requirements.txt` exists, return its path. 152 """ 153 if self.__file__ is None: 154 return None 155 path = pathlib.Path(self.__file__).parent / 'requirements.txt' 156 if not path.exists(): 157 return None 158 return path
If a file named requirements.txt exists, return its path.
161 def is_installed(self, **kw) -> bool: 162 """ 163 Check whether a plugin is correctly installed. 164 165 Returns 166 ------- 167 A `bool` indicating whether a plugin exists and is successfully imported. 168 """ 169 return self.__file__ is not None
Check whether a plugin is correctly installed.
Returns
- A
boolindicating whether a plugin exists and is successfully imported.
172 def make_tar(self, debug: bool = False) -> pathlib.Path: 173 """ 174 Compress the plugin's source files into a `.tar.gz` archive and return the archive's path. 175 176 Parameters 177 ---------- 178 debug: bool, default False 179 Verbosity toggle. 180 181 Returns 182 ------- 183 A `pathlib.Path` to the archive file's path. 184 185 """ 186 import tarfile, pathlib, subprocess, fnmatch 187 from meerschaum.utils.debug import dprint 188 from meerschaum.utils.packages import attempt_import 189 pathspec = attempt_import('pathspec', debug=debug) 190 191 if not self.__file__: 192 from meerschaum.utils.warnings import error 193 error(f"Could not find file for plugin '{self}'.") 194 if '__init__.py' in self.__file__ or os.path.isdir(self.__file__): 195 path = self.__file__.replace('__init__.py', '') 196 is_dir = True 197 else: 198 path = self.__file__ 199 is_dir = False 200 201 old_cwd = os.getcwd() 202 real_parent_path = pathlib.Path(os.path.realpath(path)).parent 203 os.chdir(real_parent_path) 204 205 default_patterns_to_ignore = [ 206 '.pyc', 207 '__pycache__/', 208 'eggs/', 209 '__pypackages__/', 210 '.git', 211 ] 212 213 def parse_gitignore() -> 'Set[str]': 214 gitignore_path = pathlib.Path(path) / '.gitignore' 215 if not gitignore_path.exists(): 216 return set(default_patterns_to_ignore) 217 with open(gitignore_path, 'r', encoding='utf-8') as f: 218 gitignore_text = f.read() 219 return set(pathspec.PathSpec.from_lines( 220 pathspec.patterns.GitWildMatchPattern, 221 default_patterns_to_ignore + gitignore_text.splitlines() 222 ).match_tree(path)) 223 224 patterns_to_ignore = parse_gitignore() if is_dir else set() 225 226 if debug: 227 dprint(f"Patterns to ignore:\n{patterns_to_ignore}") 228 229 with tarfile.open(self.archive_path, 'w:gz') as tarf: 230 if not is_dir: 231 tarf.add(f"{self.name}.py") 232 else: 233 for root, dirs, files in os.walk(self.name): 234 for f in files: 235 good_file = True 236 fp = os.path.join(root, f) 237 for pattern in patterns_to_ignore: 238 if pattern in str(fp) or f.startswith('.'): 239 good_file = False 240 break 241 if good_file: 242 if debug: 243 dprint(f"Adding '{fp}'...") 244 tarf.add(fp) 245 246 ### clean up and change back to old directory 247 os.chdir(old_cwd) 248 249 ### change to 775 to avoid permissions issues with the API in a Docker container 250 self.archive_path.chmod(0o775) 251 252 if debug: 253 dprint(f"Created archive '{self.archive_path}'.") 254 return self.archive_path
Compress the plugin's source files into a .tar.gz archive and return the archive's path.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A
pathlib.Pathto the archive file's path.
257 def install( 258 self, 259 skip_deps: bool = False, 260 force: bool = False, 261 debug: bool = False, 262 ) -> SuccessTuple: 263 """ 264 Extract a plugin's tar archive to the plugins directory. 265 266 This function checks if the plugin is already installed and if the version is equal or 267 greater than the existing installation. 268 269 Parameters 270 ---------- 271 skip_deps: bool, default False 272 If `True`, do not install dependencies. 273 274 force: bool, default False 275 If `True`, continue with installation, even if required packages fail to install. 276 277 debug: bool, default False 278 Verbosity toggle. 279 280 Returns 281 ------- 282 A `SuccessTuple` of success (bool) and a message (str). 283 284 """ 285 if self.full_name in _ongoing_installations: 286 return True, f"Already installing plugin '{self}'." 287 _ongoing_installations.add(self.full_name) 288 289 import meerschaum.config.paths as paths 290 from meerschaum.utils.warnings import warn, error 291 if debug: 292 from meerschaum.utils.debug import dprint 293 import tarfile 294 import re 295 import ast 296 from meerschaum.plugins import sync_plugins_symlinks 297 from meerschaum.utils.packages import attempt_import, reload_meerschaum 298 from meerschaum.utils.venv import init_venv 299 from meerschaum.utils.misc import safely_extract_tar 300 old_cwd = os.getcwd() 301 old_version = '' 302 new_version = '' 303 temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name 304 temp_dir.mkdir(exist_ok=True) 305 306 if not self.archive_path.exists(): 307 return False, f"Missing archive file for plugin '{self}'." 308 if self.version is not None: 309 old_version = self.version 310 if debug: 311 dprint(f"Found existing version '{old_version}' for plugin '{self}'.") 312 313 if debug: 314 dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...") 315 316 try: 317 with tarfile.open(self.archive_path, 'r:gz') as tarf: 318 safely_extract_tar(tarf, temp_dir) 319 except Exception as e: 320 warn(e) 321 return False, f"Failed to extract plugin '{self.name}'." 322 323 ### search for version information 324 files = os.listdir(temp_dir) 325 326 if str(files[0]) == self.name: 327 is_dir = True 328 elif str(files[0]) == self.name + '.py': 329 is_dir = False 330 else: 331 error(f"Unknown format encountered for plugin '{self}'.") 332 333 fpath = temp_dir / files[0] 334 if is_dir: 335 fpath = fpath / '__init__.py' 336 337 init_venv(self.name, debug=debug) 338 with open(fpath, 'r', encoding='utf-8') as f: 339 init_lines = f.readlines() 340 new_version = None 341 for line in init_lines: 342 if '__version__' not in line: 343 continue 344 version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip()) 345 if not version_match: 346 continue 347 new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip()) 348 break 349 if not new_version: 350 warn( 351 f"No `__version__` defined for plugin '{self}'. " 352 + "Assuming new version...", 353 stack = False, 354 ) 355 356 packaging_version = attempt_import('packaging.version') 357 try: 358 is_new_version = (not new_version and not old_version) or ( 359 packaging_version.parse(old_version) < packaging_version.parse(new_version) 360 ) 361 is_same_version = new_version and old_version and ( 362 packaging_version.parse(old_version) == packaging_version.parse(new_version) 363 ) 364 except Exception: 365 is_new_version, is_same_version = True, False 366 367 ### Determine where to permanently store the new plugin. 368 plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0] 369 for path in paths.PLUGINS_DIR_PATHS: 370 if not path.exists(): 371 warn(f"Plugins path does not exist: {path}", stack=False) 372 continue 373 374 files_in_plugins_dir = os.listdir(path) 375 if ( 376 self.name in files_in_plugins_dir 377 or 378 (self.name + '.py') in files_in_plugins_dir 379 ): 380 plugin_installation_dir_path = path 381 break 382 383 success_msg = ( 384 f"Successfully installed plugin '{self}'" 385 + ("\n (skipped dependencies)" if skip_deps else "") 386 + "." 387 ) 388 success, abort = None, None 389 390 if is_same_version and not force: 391 success, msg = True, ( 392 f"Plugin '{self}' is up-to-date (version {old_version}).\n" + 393 " Install again with `-f` or `--force` to reinstall." 394 ) 395 abort = True 396 elif is_new_version or force: 397 for src_dir, dirs, files in os.walk(temp_dir): 398 if success is not None: 399 break 400 dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path)) 401 if not os.path.exists(dst_dir): 402 os.mkdir(dst_dir) 403 for f in files: 404 src_file = os.path.join(src_dir, f) 405 dst_file = os.path.join(dst_dir, f) 406 if os.path.exists(dst_file): 407 os.remove(dst_file) 408 409 if debug: 410 dprint(f"Moving '{src_file}' to '{dst_dir}'...") 411 try: 412 shutil.move(src_file, dst_dir) 413 except Exception: 414 success, msg = False, ( 415 f"Failed to install plugin '{self}': " + 416 f"Could not move file '{src_file}' to '{dst_dir}'" 417 ) 418 print(msg) 419 break 420 if success is None: 421 success, msg = True, success_msg 422 else: 423 success, msg = False, ( 424 f"Your installed version of plugin '{self}' ({old_version}) is higher than " 425 + f"attempted version {new_version}." 426 ) 427 428 shutil.rmtree(temp_dir) 429 os.chdir(old_cwd) 430 431 ### Reload the plugin's module. 432 sync_plugins_symlinks(debug=debug) 433 if '_module' in self.__dict__: 434 del self.__dict__['_module'] 435 init_venv(venv=self.name, force=True, debug=debug) 436 reload_meerschaum(debug=debug) 437 438 ### if we've already failed, return here 439 if not success or abort: 440 _ongoing_installations.remove(self.full_name) 441 return success, msg 442 443 ### attempt to install dependencies 444 dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug) 445 if not dependencies_installed: 446 _ongoing_installations.remove(self.full_name) 447 return False, f"Failed to install dependencies for plugin '{self}'." 448 449 ### handling success tuple, bool, or other (typically None) 450 setup_tuple = self.setup(debug=debug) 451 if isinstance(setup_tuple, tuple): 452 if not setup_tuple[0]: 453 success, msg = setup_tuple 454 elif isinstance(setup_tuple, bool): 455 if not setup_tuple: 456 success, msg = False, ( 457 f"Failed to run post-install setup for plugin '{self}'." + '\n' + 458 f"Check `setup()` in '{self.__file__}' for more information " + 459 "(no error message provided)." 460 ) 461 else: 462 success, msg = True, success_msg 463 elif setup_tuple is None: 464 success = True 465 msg = ( 466 f"Post-install for plugin '{self}' returned None. " + 467 "Assuming plugin successfully installed." 468 ) 469 warn(msg) 470 else: 471 success = False 472 msg = ( 473 f"Post-install for plugin '{self}' returned unexpected value " + 474 f"of type '{type(setup_tuple)}': {setup_tuple}" 475 ) 476 477 _ongoing_installations.remove(self.full_name) 478 _ = self.module 479 return success, msg
Extract a plugin's tar archive to the plugins directory.
This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.
Parameters
- skip_deps (bool, default False):
If
True, do not install dependencies. - force (bool, default False):
If
True, continue with installation, even if required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A
SuccessTupleof success (bool) and a message (str).
482 def remove_archive( 483 self, 484 debug: bool = False 485 ) -> SuccessTuple: 486 """Remove a plugin's archive file.""" 487 if not self.archive_path.exists(): 488 return True, f"Archive file for plugin '{self}' does not exist." 489 try: 490 self.archive_path.unlink() 491 except Exception as e: 492 return False, f"Failed to remove archive for plugin '{self}':\n{e}" 493 return True, "Success"
Remove a plugin's archive file.
496 def remove_venv( 497 self, 498 debug: bool = False 499 ) -> SuccessTuple: 500 """Remove a plugin's virtual environment.""" 501 if not self.venv_path.exists(): 502 return True, f"Virtual environment for plugin '{self}' does not exist." 503 try: 504 shutil.rmtree(self.venv_path) 505 except Exception as e: 506 return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}" 507 return True, "Success"
Remove a plugin's virtual environment.
510 def uninstall(self, debug: bool = False) -> SuccessTuple: 511 """ 512 Remove a plugin, its virtual environment, and archive file. 513 """ 514 from meerschaum.utils.packages import reload_meerschaum 515 from meerschaum.plugins import sync_plugins_symlinks 516 from meerschaum.utils.warnings import warn, info 517 warnings_thrown_count: int = 0 518 max_warnings: int = 3 519 520 if not self.is_installed(): 521 info( 522 f"Plugin '{self.name}' doesn't seem to be installed.\n " 523 + "Checking for artifacts...", 524 stack = False, 525 ) 526 else: 527 real_path = pathlib.Path(os.path.realpath(self.__file__)) 528 try: 529 if real_path.name == '__init__.py': 530 shutil.rmtree(real_path.parent) 531 else: 532 real_path.unlink() 533 except Exception as e: 534 warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False) 535 warnings_thrown_count += 1 536 else: 537 info(f"Removed source files for plugin '{self.name}'.") 538 539 if self.venv_path.exists(): 540 success, msg = self.remove_venv(debug=debug) 541 if not success: 542 warn(msg, stack=False) 543 warnings_thrown_count += 1 544 else: 545 info(f"Removed virtual environment from plugin '{self.name}'.") 546 547 success = warnings_thrown_count < max_warnings 548 sync_plugins_symlinks(debug=debug) 549 self.deactivate_venv(force=True, debug=debug) 550 reload_meerschaum(debug=debug) 551 return success, ( 552 f"Successfully uninstalled plugin '{self}'." if success 553 else f"Failed to uninstall plugin '{self}'." 554 )
Remove a plugin, its virtual environment, and archive file.
557 def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]: 558 """ 559 If exists, run the plugin's `setup()` function. 560 561 Parameters 562 ---------- 563 *args: str 564 The positional arguments passed to the `setup()` function. 565 566 debug: bool, default False 567 Verbosity toggle. 568 569 **kw: Any 570 The keyword arguments passed to the `setup()` function. 571 572 Returns 573 ------- 574 A `SuccessTuple` or `bool` indicating success. 575 576 """ 577 from meerschaum.utils.debug import dprint 578 import inspect 579 _setup = None 580 for name, fp in inspect.getmembers(self.module): 581 if name == 'setup' and inspect.isfunction(fp): 582 _setup = fp 583 break 584 585 ### assume success if no setup() is found (not necessary) 586 if _setup is None: 587 return True 588 589 sig = inspect.signature(_setup) 590 has_debug, has_kw = ('debug' in sig.parameters), False 591 for k, v in sig.parameters.items(): 592 if '**' in str(v): 593 has_kw = True 594 break 595 596 _kw = {} 597 if has_kw: 598 _kw.update(kw) 599 if has_debug: 600 _kw['debug'] = debug 601 602 if debug: 603 dprint(f"Running setup for plugin '{self}'...") 604 try: 605 self.activate_venv(debug=debug) 606 return_tuple = _setup(*args, **_kw) 607 self.deactivate_venv(debug=debug) 608 except Exception as e: 609 return False, str(e) 610 611 if isinstance(return_tuple, tuple): 612 return return_tuple 613 if isinstance(return_tuple, bool): 614 return return_tuple, f"Setup for Plugin '{self.name}' did not return a message." 615 if return_tuple is None: 616 return False, f"Setup for Plugin '{self.name}' returned None." 617 return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
If exists, run the plugin's setup() function.
Parameters
- *args (str):
The positional arguments passed to the
setup()function. - debug (bool, default False): Verbosity toggle.
- **kw (Any):
The keyword arguments passed to the
setup()function.
Returns
- A
SuccessTupleorboolindicating success.
620 def get_dependencies( 621 self, 622 debug: bool = False, 623 ) -> List[str]: 624 """ 625 If the Plugin has specified dependencies in a list called `required`, return the list. 626 627 **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages. 628 Meerschaum plugins may also specify connector keys for a repo after `'@'`. 629 630 Parameters 631 ---------- 632 debug: bool, default False 633 Verbosity toggle. 634 635 Returns 636 ------- 637 A list of required packages and plugins (str). 638 639 """ 640 if '_required' in self.__dict__: 641 return self._required 642 643 ### If the plugin has not yet been imported, 644 ### infer the dependencies from the source text. 645 ### This is not super robust, and it doesn't feel right 646 ### having multiple versions of the logic. 647 ### This is necessary when determining the activation order 648 ### without having import the module. 649 ### For consistency's sake, the module-less method does not cache the requirements. 650 if self.__dict__.get('_module', None) is None: 651 file_path = self.__file__ 652 if file_path is None: 653 return [] 654 with open(file_path, 'r', encoding='utf-8') as f: 655 text = f.read() 656 657 if 'required' not in text: 658 return [] 659 660 ### This has some limitations: 661 ### It relies on `required` being manually declared. 662 ### We lose the ability to dynamically alter the `required` list, 663 ### which is why we've kept the module-reliant method below. 664 import ast, re 665 ### NOTE: This technically would break 666 ### if `required` was the very first line of the file. 667 req_start_match = re.search(r'\nrequired(:\s*)?.*=', text) 668 if not req_start_match: 669 return [] 670 req_start = req_start_match.start() 671 equals_sign = req_start + text[req_start:].find('=') 672 673 ### Dependencies may have brackets within the strings, so push back the index. 674 first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[') 675 if first_opening_brace == -1: 676 return [] 677 678 next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']') 679 if next_closing_brace == -1: 680 return [] 681 682 start_ix = first_opening_brace + 1 683 end_ix = next_closing_brace 684 685 num_braces = 0 686 while True: 687 if '[' not in text[start_ix:end_ix]: 688 break 689 num_braces += 1 690 start_ix = end_ix 691 end_ix += text[end_ix + 1:].find(']') + 1 692 693 req_end = end_ix + 1 694 req_text = ( 695 text[(first_opening_brace-1):req_end] 696 .lstrip() 697 .replace('=', '', 1) 698 .lstrip() 699 .rstrip() 700 ) 701 try: 702 required = ast.literal_eval(req_text) 703 except Exception as e: 704 warn( 705 f"Unable to determine requirements for plugin '{self.name}' " 706 + "without importing the module.\n" 707 + " This may be due to dynamically setting the global `required` list.\n" 708 + f" {e}" 709 ) 710 return [] 711 return required 712 713 import inspect 714 self.activate_venv(dependencies=False, debug=debug) 715 required = [] 716 for name, val in inspect.getmembers(self.module): 717 if name == 'required': 718 required = val 719 break 720 self._required = required 721 self.deactivate_venv(dependencies=False, debug=debug) 722 return required
If the Plugin has specified dependencies in a list called required, return the list.
NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages.
Meerschaum plugins may also specify connector keys for a repo after '@'.
Parameters
- debug (bool, default False): Verbosity toggle.
Returns
- A list of required packages and plugins (str).
725 def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]: 726 """ 727 Return a list of required Plugin objects. 728 """ 729 from meerschaum.utils.warnings import warn 730 from meerschaum.config import get_config 731 from meerschaum._internal.static import STATIC_CONFIG 732 from meerschaum.connectors.parse import is_valid_connector_keys 733 plugins = [] 734 _deps = self.get_dependencies(debug=debug) 735 sep = STATIC_CONFIG['plugins']['repo_separator'] 736 plugin_names = [ 737 _d[len('plugin:'):] for _d in _deps 738 if _d.startswith('plugin:') and len(_d) > len('plugin:') 739 ] 740 default_repo_keys = get_config('meerschaum', 'repository') 741 skipped_repo_keys = set() 742 743 for _plugin_name in plugin_names: 744 if sep in _plugin_name: 745 try: 746 _plugin_name, _repo_keys = _plugin_name.split(sep) 747 except Exception: 748 _repo_keys = default_repo_keys 749 warn( 750 f"Invalid repo keys for required plugin '{_plugin_name}'.\n " 751 + f"Will try to use '{_repo_keys}' instead.", 752 stack = False, 753 ) 754 else: 755 _repo_keys = default_repo_keys 756 757 if _repo_keys in skipped_repo_keys: 758 continue 759 760 if not is_valid_connector_keys(_repo_keys): 761 warn( 762 f"Invalid connector '{_repo_keys}'.\n" 763 f" Skipping required plugins from repository '{_repo_keys}'", 764 stack=False, 765 ) 766 continue 767 768 plugins.append(Plugin(_plugin_name, repo=_repo_keys)) 769 770 return plugins
Return a list of required Plugin objects.
773 def get_required_packages(self, debug: bool=False) -> List[str]: 774 """ 775 Return the required package names (excluding plugins). 776 """ 777 _deps = self.get_dependencies(debug=debug) 778 return [_d for _d in _deps if not _d.startswith('plugin:')]
Return the required package names (excluding plugins).
781 def activate_venv( 782 self, 783 dependencies: bool = True, 784 init_if_not_exists: bool = True, 785 debug: bool = False, 786 **kw 787 ) -> bool: 788 """ 789 Activate the virtual environments for the plugin and its dependencies. 790 791 Parameters 792 ---------- 793 dependencies: bool, default True 794 If `True`, activate the virtual environments for required plugins. 795 796 Returns 797 ------- 798 A bool indicating success. 799 """ 800 import meerschaum.config.paths as paths 801 from meerschaum.utils.venv import venv_target_path 802 from meerschaum.utils.packages import activate_venv 803 from meerschaum.utils.misc import make_symlink, is_symlink 804 805 if dependencies: 806 for plugin in self.get_required_plugins(debug=debug): 807 plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw) 808 809 vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True) 810 venv_meerschaum_path = vtp / 'meerschaum' 811 812 try: 813 success, msg = True, "Success" 814 if is_symlink(venv_meerschaum_path): 815 if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH: 816 venv_meerschaum_path.unlink() 817 success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH) 818 except Exception as e: 819 success, msg = False, str(e) 820 if not success: 821 warn( 822 f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n" 823 f"{msg}" 824 ) 825 826 return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
Activate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, activate the virtual environments for required plugins.
Returns
- A bool indicating success.
829 def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool: 830 """ 831 Deactivate the virtual environments for the plugin and its dependencies. 832 833 Parameters 834 ---------- 835 dependencies: bool, default True 836 If `True`, deactivate the virtual environments for required plugins. 837 838 Returns 839 ------- 840 A bool indicating success. 841 """ 842 from meerschaum.utils.packages import deactivate_venv 843 success = deactivate_venv(self.name, debug=debug, **kw) 844 if dependencies: 845 for plugin in self.get_required_plugins(debug=debug): 846 plugin.deactivate_venv(debug=debug, **kw) 847 return success
Deactivate the virtual environments for the plugin and its dependencies.
Parameters
- dependencies (bool, default True):
If
True, deactivate the virtual environments for required plugins.
Returns
- A bool indicating success.
850 def install_dependencies( 851 self, 852 force: bool = False, 853 debug: bool = False, 854 ) -> bool: 855 """ 856 If specified, install dependencies. 857 858 **NOTE:** Dependencies that start with `'plugin:'` will be installed as 859 Meerschaum plugins from the same repository as this Plugin. 860 To install from a different repository, add the repo keys after `'@'` 861 (e.g. `'plugin:foo@api:bar'`). 862 863 Parameters 864 ---------- 865 force: bool, default False 866 If `True`, continue with the installation, even if some 867 required packages fail to install. 868 869 debug: bool, default False 870 Verbosity toggle. 871 872 Returns 873 ------- 874 A bool indicating success. 875 """ 876 from meerschaum.utils.packages import pip_install, venv_contains_package 877 from meerschaum.utils.warnings import warn, info 878 _deps = self.get_dependencies(debug=debug) 879 if not _deps and self.requirements_file_path is None: 880 return True 881 882 plugins = self.get_required_plugins(debug=debug) 883 for _plugin in plugins: 884 if _plugin.name == self.name: 885 warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False) 886 continue 887 _success, _msg = _plugin.repo_connector.install_plugin( 888 _plugin.name, debug=debug, force=force 889 ) 890 if not _success: 891 warn( 892 f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'" 893 + f" for plugin '{self.name}':\n" + _msg, 894 stack = False, 895 ) 896 if not force: 897 warn( 898 "Try installing with the `--force` flag to continue anyway.", 899 stack = False, 900 ) 901 return False 902 info( 903 "Continuing with installation despite the failure " 904 + "(careful, things might be broken!)...", 905 icon = False 906 ) 907 908 909 ### First step: parse `requirements.txt` if it exists. 910 if self.requirements_file_path is not None: 911 if not pip_install( 912 requirements_file_path=self.requirements_file_path, 913 venv=self.name, debug=debug 914 ): 915 warn( 916 f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.", 917 stack = False, 918 ) 919 if not force: 920 warn( 921 "Try installing with `--force` to continue anyway.", 922 stack = False, 923 ) 924 return False 925 info( 926 "Continuing with installation despite the failure " 927 + "(careful, things might be broken!)...", 928 icon = False 929 ) 930 931 932 ### Don't reinstall packages that are already included in required plugins. 933 packages = [] 934 _packages = self.get_required_packages(debug=debug) 935 accounted_for_packages = set() 936 for package_name in _packages: 937 for plugin in plugins: 938 if venv_contains_package(package_name, plugin.name): 939 accounted_for_packages.add(package_name) 940 break 941 packages = [pkg for pkg in _packages if pkg not in accounted_for_packages] 942 943 ### Attempt pip packages installation. 944 if packages: 945 for package in packages: 946 if not pip_install(package, venv=self.name, debug=debug): 947 warn( 948 f"Failed to install required package '{package}'" 949 + f" for plugin '{self.name}'.", 950 stack = False, 951 ) 952 if not force: 953 warn( 954 "Try installing with `--force` to continue anyway.", 955 stack = False, 956 ) 957 return False 958 info( 959 "Continuing with installation despite the failure " 960 + "(careful, things might be broken!)...", 961 icon = False 962 ) 963 return True
If specified, install dependencies.
NOTE: Dependencies that start with 'plugin:' will be installed as
Meerschaum plugins from the same repository as this Plugin.
To install from a different repository, add the repo keys after '@'
(e.g. 'plugin:foo@api:bar').
Parameters
- force (bool, default False):
If
True, continue with the installation, even if some required packages fail to install. - debug (bool, default False): Verbosity toggle.
Returns
- A bool indicating success.
966 @property 967 def full_name(self) -> str: 968 """ 969 Include the repo keys with the plugin's name. 970 """ 971 from meerschaum._internal.static import STATIC_CONFIG 972 sep = STATIC_CONFIG['plugins']['repo_separator'] 973 return self.name + sep + str(self.repo_connector)
Include the repo keys with the plugin's name.
19class Venv: 20 """ 21 Manage a virtual enviroment's activation status. 22 23 Examples 24 -------- 25 >>> from meerschaum.plugins import Plugin 26 >>> with Venv('mrsm') as venv: 27 ... import pandas 28 >>> with Venv(Plugin('noaa')) as venv: 29 ... import requests 30 >>> venv = Venv('mrsm') 31 >>> venv.activate() 32 True 33 >>> venv.deactivate() 34 True 35 >>> 36 """ 37 38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 61 62 63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 ) 86 87 88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs) 95 96 97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug) 106 107 108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 import meerschaum.config.paths as paths 114 if self._venv is None: 115 return self.target_path.parent 116 return paths.VIRTENV_RESOURCES_PATH / self._venv 117 118 119 def __enter__(self) -> None: 120 self.activate(debug=self._debug) 121 122 123 def __exit__(self, exc_type, exc_value, exc_traceback) -> None: 124 self.deactivate(debug=self._debug) 125 126 127 def __str__(self) -> str: 128 quote = "'" if self._venv is not None else "" 129 return "Venv(" + quote + str(self._venv) + quote + ")" 130 131 132 def __repr__(self) -> str: 133 return self.__str__()
Manage a virtual enviroment's activation status.
Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
... import pandas
>>> with Venv(Plugin('noaa')) as venv:
... import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
38 def __init__( 39 self, 40 venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm', 41 init_if_not_exists: bool = True, 42 debug: bool = False, 43 ) -> None: 44 from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs 45 ### For some weird threading issue, 46 ### we can't use `isinstance` here. 47 if '_Plugin' in str(type(venv)): 48 self._venv = venv.name 49 self._activate = venv.activate_venv 50 self._deactivate = venv.deactivate_venv 51 self._kwargs = {} 52 else: 53 self._venv = venv 54 self._activate = activate_venv 55 self._deactivate = deactivate_venv 56 self._kwargs = {'venv': venv} 57 self._debug = debug 58 self._init_if_not_exists = init_if_not_exists 59 ### In case someone calls `deactivate()` before `activate()`. 60 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
63 def activate(self, debug: bool = False) -> bool: 64 """ 65 Activate this virtual environment. 66 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 67 will also be activated. 68 """ 69 from meerschaum.utils.venv import active_venvs, init_venv 70 self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs) 71 try: 72 return self._activate( 73 debug=(debug or self._debug), 74 init_if_not_exists=self._init_if_not_exists, 75 **self._kwargs 76 ) 77 except OSError as e: 78 if self._init_if_not_exists: 79 if not init_venv(self._venv, force=True): 80 raise e 81 return self._activate( 82 debug=(debug or self._debug), 83 init_if_not_exists=self._init_if_not_exists, 84 **self._kwargs 85 )
Activate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be activated.
88 def deactivate(self, debug: bool = False) -> bool: 89 """ 90 Deactivate this virtual environment. 91 If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments 92 will also be deactivated. 93 """ 94 return self._deactivate(debug=(debug or self._debug), **self._kwargs)
Deactivate this virtual environment.
If a meerschaum.plugins.Plugin was provided, its dependent virtual environments
will also be deactivated.
97 @property 98 def target_path(self) -> pathlib.Path: 99 """ 100 Return the target site-packages path for this virtual environment. 101 A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version 102 (e.g. Python 3.10 and Python 3.7). 103 """ 104 from meerschaum.utils.venv import venv_target_path 105 return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
Return the target site-packages path for this virtual environment.
A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version
(e.g. Python 3.10 and Python 3.7).
108 @property 109 def root_path(self) -> pathlib.Path: 110 """ 111 Return the top-level path for this virtual environment. 112 """ 113 import meerschaum.config.paths as paths 114 if self._venv is None: 115 return self.target_path.parent 116 return paths.VIRTENV_RESOURCES_PATH / self._venv
Return the top-level path for this virtual environment.
70class Job: 71 """ 72 Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API. 73 """ 74 75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break 202 203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 psutil = mrsm.attempt_import('psutil') 217 try: 218 process = psutil.Process(pid) 219 except psutil.NoSuchProcess as e: 220 warn(f"Process with PID {pid} does not exist.", stack=False) 221 raise e 222 223 command_args = process.cmdline() 224 is_daemon = command_args[1] == '-c' 225 226 if is_daemon: 227 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 228 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 229 if root_dir is None: 230 root_dir = paths.ROOT_DIR_PATH 231 else: 232 root_dir = pathlib.Path(root_dir) 233 jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name 234 daemon_dir = jobs_dir / daemon_id 235 pid_file = daemon_dir / 'process.pid' 236 237 if pid_file.exists(): 238 with open(pid_file, 'r', encoding='utf-8') as f: 239 daemon_pid = int(f.read()) 240 241 if pid != daemon_pid: 242 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 243 else: 244 raise EnvironmentError(f"Is job '{daemon_id}' running?") 245 246 return Job(daemon_id, executor_keys=executor_keys) 247 248 from meerschaum._internal.arguments._parse_arguments import parse_arguments 249 from meerschaum.utils.daemon import get_new_daemon_name 250 251 mrsm_ix = 0 252 for i, arg in enumerate(command_args): 253 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 254 mrsm_ix = i 255 break 256 257 sysargs = command_args[mrsm_ix+1:] 258 kwargs = parse_arguments(sysargs) 259 name = kwargs.get('name', get_new_daemon_name()) 260 return Job(name, sysargs, executor_keys=executor_keys) 261 262 def start(self, debug: bool = False) -> SuccessTuple: 263 """ 264 Start the job's daemon. 265 """ 266 if self.executor is not None: 267 if not self.exists(debug=debug): 268 return self.executor.create_job( 269 self.name, 270 self.sysargs, 271 properties=self.daemon.properties, 272 debug=debug, 273 ) 274 return self.executor.start_job(self.name, debug=debug) 275 276 if self.is_running(): 277 return True, f"{self} is already running." 278 279 success, msg = self.daemon.run( 280 keep_daemon_output=(not self.delete_after_completion), 281 allow_dirty_run=True, 282 ) 283 if not success: 284 return success, msg 285 286 return success, f"Started {self}." 287 288 def stop( 289 self, 290 timeout_seconds: Union[int, float, None] = None, 291 debug: bool = False, 292 ) -> SuccessTuple: 293 """ 294 Stop the job's daemon. 295 """ 296 if self.executor is not None: 297 return self.executor.stop_job(self.name, debug=debug) 298 299 if self.daemon.status == 'stopped': 300 if not self.restart: 301 return True, f"{self} is not running." 302 elif self.stop_time is not None: 303 return True, f"{self} will not restart until manually started." 304 305 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 306 if quit_success: 307 return quit_success, f"Stopped {self}." 308 309 warn( 310 f"Failed to gracefully quit {self}.", 311 stack=False, 312 ) 313 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 314 if not kill_success: 315 return kill_success, kill_msg 316 317 return kill_success, f"Killed {self}." 318 319 def pause( 320 self, 321 timeout_seconds: Union[int, float, None] = None, 322 debug: bool = False, 323 ) -> SuccessTuple: 324 """ 325 Pause the job's daemon. 326 """ 327 if self.executor is not None: 328 return self.executor.pause_job(self.name, debug=debug) 329 330 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 331 if not pause_success: 332 return pause_success, pause_msg 333 334 return pause_success, f"Paused {self}." 335 336 def delete(self, debug: bool = False) -> SuccessTuple: 337 """ 338 Delete the job and its daemon. 339 """ 340 if self.executor is not None: 341 return self.executor.delete_job(self.name, debug=debug) 342 343 if self.is_running(): 344 stop_success, stop_msg = self.stop() 345 if not stop_success: 346 return stop_success, stop_msg 347 348 cleanup_success, cleanup_msg = self.daemon.cleanup() 349 if not cleanup_success: 350 return cleanup_success, cleanup_msg 351 352 _ = self.daemon._properties.pop('result', None) 353 return cleanup_success, f"Deleted {self}." 354 355 def is_running(self) -> bool: 356 """ 357 Determine whether the job's daemon is running. 358 """ 359 return self.status == 'running' 360 361 def exists(self, debug: bool = False) -> bool: 362 """ 363 Determine whether the job exists. 364 """ 365 if self.executor is not None: 366 return self.executor.get_job_exists(self.name, debug=debug) 367 368 return self.daemon.path.exists() 369 370 def get_logs(self) -> Union[str, None]: 371 """ 372 Return the output text of the job's daemon. 373 """ 374 if self.executor is not None: 375 return self.executor.get_logs(self.name) 376 377 return self.daemon.log_text 378 379 def monitor_logs( 380 self, 381 callback_function: Callable[[str], None] = _default_stdout_callback, 382 input_callback_function: Optional[Callable[[], str]] = None, 383 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 384 stop_event: Optional[asyncio.Event] = None, 385 stop_on_exit: bool = False, 386 strip_timestamps: bool = False, 387 accept_input: bool = True, 388 debug: bool = False, 389 _logs_path: Optional[pathlib.Path] = None, 390 _log=None, 391 _stdin_file=None, 392 _wait_if_stopped: bool = True, 393 ): 394 """ 395 Monitor the job's log files and execute a callback on new lines. 396 397 Parameters 398 ---------- 399 callback_function: Callable[[str], None], default partial(print, end='') 400 The callback to execute as new data comes in. 401 Defaults to printing the output directly to `stdout`. 402 403 input_callback_function: Optional[Callable[[], str]], default None 404 If provided, execute this callback when the daemon is blocking on stdin. 405 Defaults to `sys.stdin.readline()`. 406 407 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 408 If provided, execute this callback when the daemon stops. 409 The job's SuccessTuple will be passed to the callback. 410 411 stop_event: Optional[asyncio.Event], default None 412 If provided, stop monitoring when this event is set. 413 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 414 from within `callback_function` to stop monitoring. 415 416 stop_on_exit: bool, default False 417 If `True`, stop monitoring when the job stops. 418 419 strip_timestamps: bool, default False 420 If `True`, remove leading timestamps from lines. 421 422 accept_input: bool, default True 423 If `True`, accept input when the daemon blocks on stdin. 424 """ 425 if self.executor is not None: 426 self.executor.monitor_logs( 427 self.name, 428 callback_function, 429 input_callback_function=input_callback_function, 430 stop_callback_function=stop_callback_function, 431 stop_on_exit=stop_on_exit, 432 accept_input=accept_input, 433 strip_timestamps=strip_timestamps, 434 debug=debug, 435 ) 436 return 437 438 monitor_logs_coroutine = self.monitor_logs_async( 439 callback_function=callback_function, 440 input_callback_function=input_callback_function, 441 stop_callback_function=stop_callback_function, 442 stop_event=stop_event, 443 stop_on_exit=stop_on_exit, 444 strip_timestamps=strip_timestamps, 445 accept_input=accept_input, 446 debug=debug, 447 _logs_path=_logs_path, 448 _log=_log, 449 _stdin_file=_stdin_file, 450 _wait_if_stopped=_wait_if_stopped, 451 ) 452 return asyncio.run(monitor_logs_coroutine) 453 454 async def monitor_logs_async( 455 self, 456 callback_function: Callable[[str], None] = _default_stdout_callback, 457 input_callback_function: Optional[Callable[[], str]] = None, 458 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 459 stop_event: Optional[asyncio.Event] = None, 460 stop_on_exit: bool = False, 461 strip_timestamps: bool = False, 462 accept_input: bool = True, 463 debug: bool = False, 464 _logs_path: Optional[pathlib.Path] = None, 465 _log=None, 466 _stdin_file=None, 467 _wait_if_stopped: bool = True, 468 ): 469 """ 470 Monitor the job's log files and await a callback on new lines. 471 472 Parameters 473 ---------- 474 callback_function: Callable[[str], None], default _default_stdout_callback 475 The callback to execute as new data comes in. 476 Defaults to printing the output directly to `stdout`. 477 478 input_callback_function: Optional[Callable[[], str]], default None 479 If provided, execute this callback when the daemon is blocking on stdin. 480 Defaults to `sys.stdin.readline()`. 481 482 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 483 If provided, execute this callback when the daemon stops. 484 The job's SuccessTuple will be passed to the callback. 485 486 stop_event: Optional[asyncio.Event], default None 487 If provided, stop monitoring when this event is set. 488 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 489 from within `callback_function` to stop monitoring. 490 491 stop_on_exit: bool, default False 492 If `True`, stop monitoring when the job stops. 493 494 strip_timestamps: bool, default False 495 If `True`, remove leading timestamps from lines. 496 497 accept_input: bool, default True 498 If `True`, accept input when the daemon blocks on stdin. 499 """ 500 from meerschaum.utils.prompt import prompt 501 502 def default_input_callback_function(): 503 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 504 if prompt_kwargs: 505 answer = prompt(**prompt_kwargs) 506 return answer + '\n' 507 return sys.stdin.readline() 508 509 if input_callback_function is None: 510 input_callback_function = default_input_callback_function 511 512 if self.executor is not None: 513 await self.executor.monitor_logs_async( 514 self.name, 515 callback_function, 516 input_callback_function=input_callback_function, 517 stop_callback_function=stop_callback_function, 518 stop_on_exit=stop_on_exit, 519 strip_timestamps=strip_timestamps, 520 accept_input=accept_input, 521 debug=debug, 522 ) 523 return 524 525 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 526 527 events = { 528 'user': stop_event, 529 'stopped': asyncio.Event(), 530 'stop_token': asyncio.Event(), 531 'stop_exception': asyncio.Event(), 532 'stopped_timeout': asyncio.Event(), 533 } 534 combined_event = asyncio.Event() 535 emitted_text = False 536 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 537 538 async def check_job_status(): 539 if not stop_on_exit: 540 return 541 542 nonlocal emitted_text 543 544 sleep_time = 0.1 545 while sleep_time < 0.2: 546 if self.status == 'stopped': 547 if not emitted_text and _wait_if_stopped: 548 await asyncio.sleep(sleep_time) 549 sleep_time = round(sleep_time * 1.1, 3) 550 continue 551 552 if stop_callback_function is not None: 553 try: 554 if asyncio.iscoroutinefunction(stop_callback_function): 555 await stop_callback_function(self.result) 556 else: 557 stop_callback_function(self.result) 558 except asyncio.exceptions.CancelledError: 559 break 560 except Exception: 561 warn(traceback.format_exc()) 562 563 if stop_on_exit: 564 events['stopped'].set() 565 566 break 567 await asyncio.sleep(0.1) 568 569 events['stopped_timeout'].set() 570 571 async def check_blocking_on_input(): 572 while True: 573 if not emitted_text or not self.is_blocking_on_stdin(): 574 try: 575 await asyncio.sleep(self.refresh_seconds) 576 except asyncio.exceptions.CancelledError: 577 break 578 continue 579 580 if not self.is_running(): 581 break 582 583 await emit_latest_lines() 584 585 try: 586 print('', end='', flush=True) 587 if asyncio.iscoroutinefunction(input_callback_function): 588 data = await input_callback_function() 589 else: 590 loop = asyncio.get_running_loop() 591 data = await loop.run_in_executor(None, input_callback_function) 592 except KeyboardInterrupt: 593 break 594 # if not data.endswith('\n'): 595 # data += '\n' 596 597 stdin_file.write(data) 598 await asyncio.sleep(self.refresh_seconds) 599 600 async def combine_events(): 601 event_tasks = [ 602 asyncio.create_task(event.wait()) 603 for event in events.values() 604 if event is not None 605 ] 606 if not event_tasks: 607 return 608 609 try: 610 done, pending = await asyncio.wait( 611 event_tasks, 612 return_when=asyncio.FIRST_COMPLETED, 613 ) 614 for task in pending: 615 task.cancel() 616 except asyncio.exceptions.CancelledError: 617 pass 618 finally: 619 combined_event.set() 620 621 check_job_status_task = asyncio.create_task(check_job_status()) 622 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 623 combine_events_task = asyncio.create_task(combine_events()) 624 625 log = _log if _log is not None else self.daemon.rotating_log 626 lines_to_show = ( 627 self.daemon.properties.get( 628 'logs', {} 629 ).get( 630 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 631 ) 632 ) 633 634 async def emit_latest_lines(): 635 nonlocal emitted_text 636 nonlocal stop_event 637 lines = log.readlines() 638 for line in lines[(-1 * lines_to_show):]: 639 if stop_event is not None and stop_event.is_set(): 640 return 641 642 line_stripped_extra = strip_timestamp_from_line(line.strip()) 643 line_stripped = strip_timestamp_from_line(line) 644 645 if line_stripped_extra == STOP_TOKEN: 646 events['stop_token'].set() 647 return 648 649 if line_stripped_extra == CLEAR_TOKEN: 650 clear_screen(debug=debug) 651 continue 652 653 if line_stripped_extra == FLUSH_TOKEN.strip(): 654 line_stripped = '' 655 line = '' 656 657 if strip_timestamps: 658 line = line_stripped 659 660 try: 661 if asyncio.iscoroutinefunction(callback_function): 662 await callback_function(line) 663 else: 664 callback_function(line) 665 emitted_text = True 666 except StopMonitoringLogs: 667 events['stop_exception'].set() 668 return 669 except Exception: 670 warn(f"Error in logs callback:\n{traceback.format_exc()}") 671 672 await emit_latest_lines() 673 674 tasks = ( 675 [check_job_status_task] 676 + ([check_blocking_on_input_task] if accept_input else []) 677 + [combine_events_task] 678 ) 679 try: 680 _ = asyncio.gather(*tasks, return_exceptions=True) 681 except asyncio.exceptions.CancelledError: 682 raise 683 except Exception: 684 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 685 686 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 687 dir_path_to_monitor = ( 688 _logs_path 689 or (log.file_path.parent if log else None) 690 or paths.LOGS_RESOURCES_PATH 691 ) 692 async for changes in watchfiles.awatch( 693 dir_path_to_monitor, 694 stop_event=combined_event, 695 ): 696 for change in changes: 697 file_path_str = change[1] 698 file_path = pathlib.Path(file_path_str) 699 latest_subfile_path = log.get_latest_subfile_path() 700 if latest_subfile_path != file_path: 701 continue 702 703 await emit_latest_lines() 704 705 await emit_latest_lines() 706 707 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 708 """ 709 Return whether a job's daemon is blocking on stdin. 710 """ 711 if self.executor is not None: 712 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 713 714 return self.is_running() and self.daemon.blocking_stdin_file_path.exists() 715 716 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 717 """ 718 Return the kwargs to the blocking `prompt()`, if available. 719 """ 720 if self.executor is not None: 721 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 722 723 if not self.daemon.prompt_kwargs_file_path.exists(): 724 return {} 725 726 try: 727 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 728 prompt_kwargs = json.load(f) 729 730 return prompt_kwargs 731 732 except Exception: 733 import traceback 734 traceback.print_exc() 735 return {} 736 737 def write_stdin(self, data): 738 """ 739 Write to a job's daemon's `stdin`. 740 """ 741 self.daemon.stdin_file.write(data) 742 743 @property 744 def executor(self) -> Union[Executor, None]: 745 """ 746 If the job is remote, return the connector to the remote API instance. 747 """ 748 return ( 749 mrsm.get_connector(self.executor_keys) 750 if self.executor_keys != 'local' 751 else None 752 ) 753 754 @property 755 def status(self) -> str: 756 """ 757 Return the running status of the job's daemon. 758 """ 759 if '_status_hook' in self.__dict__: 760 return self._status_hook() 761 762 if self.executor is not None: 763 return self.executor.get_job_status(self.name) 764 765 return self.daemon.status 766 767 @property 768 def pid(self) -> Union[int, None]: 769 """ 770 Return the PID of the job's dameon. 771 """ 772 if self.executor is not None: 773 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 774 775 return self.daemon.pid 776 777 @property 778 def restart(self) -> bool: 779 """ 780 Return whether to restart a stopped job. 781 """ 782 if self.executor is not None: 783 return self.executor.get_job_metadata(self.name).get('restart', False) 784 785 return self.daemon.properties.get('restart', False) 786 787 @property 788 def result(self) -> SuccessTuple: 789 """ 790 Return the `SuccessTuple` when the job has terminated. 791 """ 792 if self.is_running(): 793 return True, f"{self} is running." 794 795 if '_result_hook' in self.__dict__: 796 return self._result_hook() 797 798 if self.executor is not None: 799 return ( 800 self.executor.get_job_metadata(self.name) 801 .get('result', (False, "No result available.")) 802 ) 803 804 _result = self.daemon.properties.get('result', None) 805 if _result is None: 806 from meerschaum.utils.daemon.Daemon import _results 807 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 808 809 return tuple(_result) 810 811 @property 812 def sysargs(self) -> List[str]: 813 """ 814 Return the sysargs to use for the Daemon. 815 """ 816 if self._sysargs: 817 return self._sysargs 818 819 if self.executor is not None: 820 return self.executor.get_job_metadata(self.name).get('sysargs', []) 821 822 target_args = self.daemon.target_args 823 if target_args is None: 824 return [] 825 self._sysargs = target_args[0] if len(target_args) > 0 else [] 826 return self._sysargs 827 828 def get_daemon_properties(self) -> Dict[str, Any]: 829 """ 830 Return the `properties` dictionary for the job's daemon. 831 """ 832 remote_properties = ( 833 {} 834 if self.executor is None 835 else self.executor.get_job_properties(self.name) 836 ) 837 return { 838 **remote_properties, 839 **self._properties_patch 840 } 841 842 @property 843 def daemon(self) -> 'Daemon': 844 """ 845 Return the daemon which this job manages. 846 """ 847 from meerschaum.utils.daemon import Daemon 848 if self._daemon is not None and self.executor is None and self._sysargs: 849 return self._daemon 850 851 self._daemon = Daemon( 852 target=entry, 853 target_args=[self._sysargs], 854 target_kw={}, 855 daemon_id=self.name, 856 label=shlex.join(self._sysargs), 857 properties=self.get_daemon_properties(), 858 ) 859 if '_rotating_log' in self.__dict__: 860 self._daemon._rotating_log = self._rotating_log 861 862 if '_stdin_file' in self.__dict__: 863 self._daemon._stdin_file = self._stdin_file 864 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 865 866 return self._daemon 867 868 @property 869 def began(self) -> Union[datetime, None]: 870 """ 871 The datetime when the job began running. 872 """ 873 if self.executor is not None: 874 began_str = self.executor.get_job_began(self.name) 875 if began_str is None: 876 return None 877 return ( 878 datetime.fromisoformat(began_str) 879 .astimezone(timezone.utc) 880 .replace(tzinfo=None) 881 ) 882 883 began_str = self.daemon.properties.get('process', {}).get('began', None) 884 if began_str is None: 885 return None 886 887 return datetime.fromisoformat(began_str) 888 889 @property 890 def ended(self) -> Union[datetime, None]: 891 """ 892 The datetime when the job stopped running. 893 """ 894 if self.executor is not None: 895 ended_str = self.executor.get_job_ended(self.name) 896 if ended_str is None: 897 return None 898 return ( 899 datetime.fromisoformat(ended_str) 900 .astimezone(timezone.utc) 901 .replace(tzinfo=None) 902 ) 903 904 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 905 if ended_str is None: 906 return None 907 908 return datetime.fromisoformat(ended_str) 909 910 @property 911 def paused(self) -> Union[datetime, None]: 912 """ 913 The datetime when the job was suspended while running. 914 """ 915 if self.executor is not None: 916 paused_str = self.executor.get_job_paused(self.name) 917 if paused_str is None: 918 return None 919 return ( 920 datetime.fromisoformat(paused_str) 921 .astimezone(timezone.utc) 922 .replace(tzinfo=None) 923 ) 924 925 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 926 if paused_str is None: 927 return None 928 929 return datetime.fromisoformat(paused_str) 930 931 @property 932 def stop_time(self) -> Union[datetime, None]: 933 """ 934 Return the timestamp when the job was manually stopped. 935 """ 936 if self.executor is not None: 937 return self.executor.get_job_stop_time(self.name) 938 939 if not self.daemon.stop_path.exists(): 940 return None 941 942 stop_data = self.daemon._read_stop_file() 943 if not stop_data: 944 return None 945 946 stop_time_str = stop_data.get('stop_time', None) 947 if not stop_time_str: 948 warn(f"Could not read stop time for {self}.") 949 return None 950 951 return datetime.fromisoformat(stop_time_str) 952 953 @property 954 def hidden(self) -> bool: 955 """ 956 Return a bool indicating whether this job should be displayed. 957 """ 958 return ( 959 self.name.startswith('_') 960 or self.name.startswith('.') 961 or self._is_externally_managed 962 ) 963 964 def check_restart(self) -> SuccessTuple: 965 """ 966 If `restart` is `True` and the daemon is not running, 967 restart the job. 968 Do not restart if the job was manually stopped. 969 """ 970 if self.is_running(): 971 return True, f"{self} is running." 972 973 if not self.restart: 974 return True, f"{self} does not need to be restarted." 975 976 if self.stop_time is not None: 977 return True, f"{self} was manually stopped." 978 979 return self.start() 980 981 @property 982 def label(self) -> str: 983 """ 984 Return the job's Daemon label (joined sysargs). 985 """ 986 from meerschaum._internal.arguments import compress_pipeline_sysargs 987 sysargs = compress_pipeline_sysargs(self.sysargs) 988 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip() 989 990 @property 991 def _externally_managed_file(self) -> pathlib.Path: 992 """ 993 Return the path to the externally managed file. 994 """ 995 return self.daemon.path / '.externally-managed' 996 997 def _set_externally_managed(self): 998 """ 999 Set this job as externally managed. 1000 """ 1001 self._externally_managed = True 1002 try: 1003 self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True) 1004 self._externally_managed_file.touch() 1005 except Exception as e: 1006 warn(e) 1007 1008 @property 1009 def _is_externally_managed(self) -> bool: 1010 """ 1011 Return whether this job is externally managed. 1012 """ 1013 return self.executor_keys in (None, 'local') and ( 1014 self._externally_managed or self._externally_managed_file.exists() 1015 ) 1016 1017 @property 1018 def env(self) -> Dict[str, str]: 1019 """ 1020 Return the environment variables to set for the job's process. 1021 """ 1022 if '_env' in self.__dict__: 1023 return self.__dict__['_env'] 1024 1025 _env = self.daemon.properties.get('env', {}) 1026 default_env = { 1027 'PYTHONUNBUFFERED': '1', 1028 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1029 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1030 STATIC_CONFIG['environment']['noninteractive']: 'true', 1031 } 1032 self._env = {**default_env, **_env} 1033 return self._env 1034 1035 @property 1036 def delete_after_completion(self) -> bool: 1037 """ 1038 Return whether this job is configured to delete itself after completion. 1039 """ 1040 if '_delete_after_completion' in self.__dict__: 1041 return self.__dict__.get('_delete_after_completion', False) 1042 1043 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1044 return self._delete_after_completion 1045 1046 def __str__(self) -> str: 1047 sysargs = self.sysargs 1048 sysargs_str = shlex.join(sysargs) if sysargs else '' 1049 job_str = f'Job("{self.name}"' 1050 if sysargs_str: 1051 job_str += f', "{sysargs_str}"' 1052 1053 job_str += ')' 1054 return job_str 1055 1056 def __repr__(self) -> str: 1057 return str(self) 1058 1059 def __hash__(self) -> int: 1060 return hash(self.name)
Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.
75 def __init__( 76 self, 77 name: str, 78 sysargs: Union[List[str], str, None] = None, 79 env: Optional[Dict[str, str]] = None, 80 executor_keys: Optional[str] = None, 81 delete_after_completion: bool = False, 82 refresh_seconds: Union[int, float, None] = None, 83 _properties: Optional[Dict[str, Any]] = None, 84 _rotating_log=None, 85 _stdin_file=None, 86 _status_hook: Optional[Callable[[], str]] = None, 87 _result_hook: Optional[Callable[[], SuccessTuple]] = None, 88 _externally_managed: bool = False, 89 ): 90 """ 91 Create a new job to manage a `meerschaum.utils.daemon.Daemon`. 92 93 Parameters 94 ---------- 95 name: str 96 The name of the job to be created. 97 This will also be used as the Daemon ID. 98 99 sysargs: Union[List[str], str, None], default None 100 The sysargs of the command to be executed, e.g. 'start api'. 101 102 env: Optional[Dict[str, str]], default None 103 If provided, set these environment variables in the job's process. 104 105 executor_keys: Optional[str], default None 106 If provided, execute the job remotely on an API instance, e.g. 'api:main'. 107 108 delete_after_completion: bool, default False 109 If `True`, delete this job when it has finished executing. 110 111 refresh_seconds: Union[int, float, None], default None 112 The number of seconds to sleep between refreshes. 113 Defaults to the configured value `system.cli.refresh_seconds`. 114 115 _properties: Optional[Dict[str, Any]], default None 116 If provided, use this to patch the daemon's properties. 117 """ 118 from meerschaum.utils.daemon import Daemon 119 for char in BANNED_CHARS: 120 if char in name: 121 raise ValueError(f"Invalid name: ({char}) is not allowed.") 122 123 if isinstance(sysargs, str): 124 sysargs = shlex.split(sysargs) 125 126 and_key = STATIC_CONFIG['system']['arguments']['and_key'] 127 escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key'] 128 if sysargs: 129 sysargs = [ 130 (arg if arg != escaped_and_key else and_key) 131 for arg in sysargs 132 ] 133 134 ### NOTE: 'local' and 'systemd' executors are being coalesced. 135 if executor_keys is None: 136 from meerschaum.jobs import get_executor_keys_from_context 137 executor_keys = get_executor_keys_from_context() 138 139 self.executor_keys = executor_keys 140 self.name = name 141 self.refresh_seconds = ( 142 refresh_seconds 143 if refresh_seconds is not None 144 else mrsm.get_config('system', 'cli', 'refresh_seconds') 145 ) 146 try: 147 self._daemon = ( 148 Daemon(daemon_id=name) 149 if executor_keys == 'local' 150 else None 151 ) 152 except Exception: 153 self._daemon = None 154 155 ### Handle any injected dependencies. 156 if _rotating_log is not None: 157 self._rotating_log = _rotating_log 158 if self._daemon is not None: 159 self._daemon._rotating_log = _rotating_log 160 161 if _stdin_file is not None: 162 self._stdin_file = _stdin_file 163 if self._daemon is not None: 164 self._daemon._stdin_file = _stdin_file 165 self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path 166 167 if _status_hook is not None: 168 self._status_hook = _status_hook 169 170 if _result_hook is not None: 171 self._result_hook = _result_hook 172 173 self._externally_managed = _externally_managed 174 self._properties_patch = _properties or {} 175 if _externally_managed: 176 self._properties_patch.update({'externally_managed': _externally_managed}) 177 178 if env: 179 self._properties_patch.update({'env': env}) 180 181 if delete_after_completion: 182 self._properties_patch.update({'delete_after_completion': delete_after_completion}) 183 184 daemon_sysargs = ( 185 self._daemon.properties.get('target', {}).get('args', [None])[0] 186 if self._daemon is not None 187 else None 188 ) 189 190 if daemon_sysargs and sysargs and daemon_sysargs != sysargs: 191 warn("Given sysargs differ from existing sysargs.") 192 193 self._sysargs = [ 194 arg 195 for arg in (daemon_sysargs or sysargs or []) 196 if arg not in ('-d', '--daemon') 197 ] 198 for restart_flag in RESTART_FLAGS: 199 if restart_flag in self._sysargs: 200 self._properties_patch.update({'restart': True}) 201 break
Create a new job to manage a meerschaum.utils.daemon.Daemon.
Parameters
- name (str): The name of the job to be created. This will also be used as the Daemon ID.
- sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
- env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
- executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
- delete_after_completion (bool, default False):
If
True, delete this job when it has finished executing. - refresh_seconds (Union[int, float, None], default None):
The number of seconds to sleep between refreshes.
Defaults to the configured value
system.cli.refresh_seconds. - _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
203 @staticmethod 204 def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job: 205 """ 206 Build a `Job` from the PID of a running Meerschaum process. 207 208 Parameters 209 ---------- 210 pid: int 211 The PID of the process. 212 213 executor_keys: Optional[str], default None 214 The executor keys to assign to the job. 215 """ 216 psutil = mrsm.attempt_import('psutil') 217 try: 218 process = psutil.Process(pid) 219 except psutil.NoSuchProcess as e: 220 warn(f"Process with PID {pid} does not exist.", stack=False) 221 raise e 222 223 command_args = process.cmdline() 224 is_daemon = command_args[1] == '-c' 225 226 if is_daemon: 227 daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '') 228 root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None) 229 if root_dir is None: 230 root_dir = paths.ROOT_DIR_PATH 231 else: 232 root_dir = pathlib.Path(root_dir) 233 jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name 234 daemon_dir = jobs_dir / daemon_id 235 pid_file = daemon_dir / 'process.pid' 236 237 if pid_file.exists(): 238 with open(pid_file, 'r', encoding='utf-8') as f: 239 daemon_pid = int(f.read()) 240 241 if pid != daemon_pid: 242 raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}") 243 else: 244 raise EnvironmentError(f"Is job '{daemon_id}' running?") 245 246 return Job(daemon_id, executor_keys=executor_keys) 247 248 from meerschaum._internal.arguments._parse_arguments import parse_arguments 249 from meerschaum.utils.daemon import get_new_daemon_name 250 251 mrsm_ix = 0 252 for i, arg in enumerate(command_args): 253 if 'mrsm' in arg or 'meerschaum' in arg.lower(): 254 mrsm_ix = i 255 break 256 257 sysargs = command_args[mrsm_ix+1:] 258 kwargs = parse_arguments(sysargs) 259 name = kwargs.get('name', get_new_daemon_name()) 260 return Job(name, sysargs, executor_keys=executor_keys)
Build a Job from the PID of a running Meerschaum process.
Parameters
- pid (int): The PID of the process.
- executor_keys (Optional[str], default None): The executor keys to assign to the job.
262 def start(self, debug: bool = False) -> SuccessTuple: 263 """ 264 Start the job's daemon. 265 """ 266 if self.executor is not None: 267 if not self.exists(debug=debug): 268 return self.executor.create_job( 269 self.name, 270 self.sysargs, 271 properties=self.daemon.properties, 272 debug=debug, 273 ) 274 return self.executor.start_job(self.name, debug=debug) 275 276 if self.is_running(): 277 return True, f"{self} is already running." 278 279 success, msg = self.daemon.run( 280 keep_daemon_output=(not self.delete_after_completion), 281 allow_dirty_run=True, 282 ) 283 if not success: 284 return success, msg 285 286 return success, f"Started {self}."
Start the job's daemon.
288 def stop( 289 self, 290 timeout_seconds: Union[int, float, None] = None, 291 debug: bool = False, 292 ) -> SuccessTuple: 293 """ 294 Stop the job's daemon. 295 """ 296 if self.executor is not None: 297 return self.executor.stop_job(self.name, debug=debug) 298 299 if self.daemon.status == 'stopped': 300 if not self.restart: 301 return True, f"{self} is not running." 302 elif self.stop_time is not None: 303 return True, f"{self} will not restart until manually started." 304 305 quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds) 306 if quit_success: 307 return quit_success, f"Stopped {self}." 308 309 warn( 310 f"Failed to gracefully quit {self}.", 311 stack=False, 312 ) 313 kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds) 314 if not kill_success: 315 return kill_success, kill_msg 316 317 return kill_success, f"Killed {self}."
Stop the job's daemon.
319 def pause( 320 self, 321 timeout_seconds: Union[int, float, None] = None, 322 debug: bool = False, 323 ) -> SuccessTuple: 324 """ 325 Pause the job's daemon. 326 """ 327 if self.executor is not None: 328 return self.executor.pause_job(self.name, debug=debug) 329 330 pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds) 331 if not pause_success: 332 return pause_success, pause_msg 333 334 return pause_success, f"Paused {self}."
Pause the job's daemon.
336 def delete(self, debug: bool = False) -> SuccessTuple: 337 """ 338 Delete the job and its daemon. 339 """ 340 if self.executor is not None: 341 return self.executor.delete_job(self.name, debug=debug) 342 343 if self.is_running(): 344 stop_success, stop_msg = self.stop() 345 if not stop_success: 346 return stop_success, stop_msg 347 348 cleanup_success, cleanup_msg = self.daemon.cleanup() 349 if not cleanup_success: 350 return cleanup_success, cleanup_msg 351 352 _ = self.daemon._properties.pop('result', None) 353 return cleanup_success, f"Deleted {self}."
Delete the job and its daemon.
355 def is_running(self) -> bool: 356 """ 357 Determine whether the job's daemon is running. 358 """ 359 return self.status == 'running'
Determine whether the job's daemon is running.
361 def exists(self, debug: bool = False) -> bool: 362 """ 363 Determine whether the job exists. 364 """ 365 if self.executor is not None: 366 return self.executor.get_job_exists(self.name, debug=debug) 367 368 return self.daemon.path.exists()
Determine whether the job exists.
370 def get_logs(self) -> Union[str, None]: 371 """ 372 Return the output text of the job's daemon. 373 """ 374 if self.executor is not None: 375 return self.executor.get_logs(self.name) 376 377 return self.daemon.log_text
Return the output text of the job's daemon.
379 def monitor_logs( 380 self, 381 callback_function: Callable[[str], None] = _default_stdout_callback, 382 input_callback_function: Optional[Callable[[], str]] = None, 383 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 384 stop_event: Optional[asyncio.Event] = None, 385 stop_on_exit: bool = False, 386 strip_timestamps: bool = False, 387 accept_input: bool = True, 388 debug: bool = False, 389 _logs_path: Optional[pathlib.Path] = None, 390 _log=None, 391 _stdin_file=None, 392 _wait_if_stopped: bool = True, 393 ): 394 """ 395 Monitor the job's log files and execute a callback on new lines. 396 397 Parameters 398 ---------- 399 callback_function: Callable[[str], None], default partial(print, end='') 400 The callback to execute as new data comes in. 401 Defaults to printing the output directly to `stdout`. 402 403 input_callback_function: Optional[Callable[[], str]], default None 404 If provided, execute this callback when the daemon is blocking on stdin. 405 Defaults to `sys.stdin.readline()`. 406 407 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 408 If provided, execute this callback when the daemon stops. 409 The job's SuccessTuple will be passed to the callback. 410 411 stop_event: Optional[asyncio.Event], default None 412 If provided, stop monitoring when this event is set. 413 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 414 from within `callback_function` to stop monitoring. 415 416 stop_on_exit: bool, default False 417 If `True`, stop monitoring when the job stops. 418 419 strip_timestamps: bool, default False 420 If `True`, remove leading timestamps from lines. 421 422 accept_input: bool, default True 423 If `True`, accept input when the daemon blocks on stdin. 424 """ 425 if self.executor is not None: 426 self.executor.monitor_logs( 427 self.name, 428 callback_function, 429 input_callback_function=input_callback_function, 430 stop_callback_function=stop_callback_function, 431 stop_on_exit=stop_on_exit, 432 accept_input=accept_input, 433 strip_timestamps=strip_timestamps, 434 debug=debug, 435 ) 436 return 437 438 monitor_logs_coroutine = self.monitor_logs_async( 439 callback_function=callback_function, 440 input_callback_function=input_callback_function, 441 stop_callback_function=stop_callback_function, 442 stop_event=stop_event, 443 stop_on_exit=stop_on_exit, 444 strip_timestamps=strip_timestamps, 445 accept_input=accept_input, 446 debug=debug, 447 _logs_path=_logs_path, 448 _log=_log, 449 _stdin_file=_stdin_file, 450 _wait_if_stopped=_wait_if_stopped, 451 ) 452 return asyncio.run(monitor_logs_coroutine)
Monitor the job's log files and execute a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default partial(print, end='')):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
454 async def monitor_logs_async( 455 self, 456 callback_function: Callable[[str], None] = _default_stdout_callback, 457 input_callback_function: Optional[Callable[[], str]] = None, 458 stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None, 459 stop_event: Optional[asyncio.Event] = None, 460 stop_on_exit: bool = False, 461 strip_timestamps: bool = False, 462 accept_input: bool = True, 463 debug: bool = False, 464 _logs_path: Optional[pathlib.Path] = None, 465 _log=None, 466 _stdin_file=None, 467 _wait_if_stopped: bool = True, 468 ): 469 """ 470 Monitor the job's log files and await a callback on new lines. 471 472 Parameters 473 ---------- 474 callback_function: Callable[[str], None], default _default_stdout_callback 475 The callback to execute as new data comes in. 476 Defaults to printing the output directly to `stdout`. 477 478 input_callback_function: Optional[Callable[[], str]], default None 479 If provided, execute this callback when the daemon is blocking on stdin. 480 Defaults to `sys.stdin.readline()`. 481 482 stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None 483 If provided, execute this callback when the daemon stops. 484 The job's SuccessTuple will be passed to the callback. 485 486 stop_event: Optional[asyncio.Event], default None 487 If provided, stop monitoring when this event is set. 488 You may instead raise `meerschaum.jobs.StopMonitoringLogs` 489 from within `callback_function` to stop monitoring. 490 491 stop_on_exit: bool, default False 492 If `True`, stop monitoring when the job stops. 493 494 strip_timestamps: bool, default False 495 If `True`, remove leading timestamps from lines. 496 497 accept_input: bool, default True 498 If `True`, accept input when the daemon blocks on stdin. 499 """ 500 from meerschaum.utils.prompt import prompt 501 502 def default_input_callback_function(): 503 prompt_kwargs = self.get_prompt_kwargs(debug=debug) 504 if prompt_kwargs: 505 answer = prompt(**prompt_kwargs) 506 return answer + '\n' 507 return sys.stdin.readline() 508 509 if input_callback_function is None: 510 input_callback_function = default_input_callback_function 511 512 if self.executor is not None: 513 await self.executor.monitor_logs_async( 514 self.name, 515 callback_function, 516 input_callback_function=input_callback_function, 517 stop_callback_function=stop_callback_function, 518 stop_on_exit=stop_on_exit, 519 strip_timestamps=strip_timestamps, 520 accept_input=accept_input, 521 debug=debug, 522 ) 523 return 524 525 from meerschaum.utils.formatting._jobs import strip_timestamp_from_line 526 527 events = { 528 'user': stop_event, 529 'stopped': asyncio.Event(), 530 'stop_token': asyncio.Event(), 531 'stop_exception': asyncio.Event(), 532 'stopped_timeout': asyncio.Event(), 533 } 534 combined_event = asyncio.Event() 535 emitted_text = False 536 stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file 537 538 async def check_job_status(): 539 if not stop_on_exit: 540 return 541 542 nonlocal emitted_text 543 544 sleep_time = 0.1 545 while sleep_time < 0.2: 546 if self.status == 'stopped': 547 if not emitted_text and _wait_if_stopped: 548 await asyncio.sleep(sleep_time) 549 sleep_time = round(sleep_time * 1.1, 3) 550 continue 551 552 if stop_callback_function is not None: 553 try: 554 if asyncio.iscoroutinefunction(stop_callback_function): 555 await stop_callback_function(self.result) 556 else: 557 stop_callback_function(self.result) 558 except asyncio.exceptions.CancelledError: 559 break 560 except Exception: 561 warn(traceback.format_exc()) 562 563 if stop_on_exit: 564 events['stopped'].set() 565 566 break 567 await asyncio.sleep(0.1) 568 569 events['stopped_timeout'].set() 570 571 async def check_blocking_on_input(): 572 while True: 573 if not emitted_text or not self.is_blocking_on_stdin(): 574 try: 575 await asyncio.sleep(self.refresh_seconds) 576 except asyncio.exceptions.CancelledError: 577 break 578 continue 579 580 if not self.is_running(): 581 break 582 583 await emit_latest_lines() 584 585 try: 586 print('', end='', flush=True) 587 if asyncio.iscoroutinefunction(input_callback_function): 588 data = await input_callback_function() 589 else: 590 loop = asyncio.get_running_loop() 591 data = await loop.run_in_executor(None, input_callback_function) 592 except KeyboardInterrupt: 593 break 594 # if not data.endswith('\n'): 595 # data += '\n' 596 597 stdin_file.write(data) 598 await asyncio.sleep(self.refresh_seconds) 599 600 async def combine_events(): 601 event_tasks = [ 602 asyncio.create_task(event.wait()) 603 for event in events.values() 604 if event is not None 605 ] 606 if not event_tasks: 607 return 608 609 try: 610 done, pending = await asyncio.wait( 611 event_tasks, 612 return_when=asyncio.FIRST_COMPLETED, 613 ) 614 for task in pending: 615 task.cancel() 616 except asyncio.exceptions.CancelledError: 617 pass 618 finally: 619 combined_event.set() 620 621 check_job_status_task = asyncio.create_task(check_job_status()) 622 check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input()) 623 combine_events_task = asyncio.create_task(combine_events()) 624 625 log = _log if _log is not None else self.daemon.rotating_log 626 lines_to_show = ( 627 self.daemon.properties.get( 628 'logs', {} 629 ).get( 630 'lines_to_show', get_config('jobs', 'logs', 'lines_to_show') 631 ) 632 ) 633 634 async def emit_latest_lines(): 635 nonlocal emitted_text 636 nonlocal stop_event 637 lines = log.readlines() 638 for line in lines[(-1 * lines_to_show):]: 639 if stop_event is not None and stop_event.is_set(): 640 return 641 642 line_stripped_extra = strip_timestamp_from_line(line.strip()) 643 line_stripped = strip_timestamp_from_line(line) 644 645 if line_stripped_extra == STOP_TOKEN: 646 events['stop_token'].set() 647 return 648 649 if line_stripped_extra == CLEAR_TOKEN: 650 clear_screen(debug=debug) 651 continue 652 653 if line_stripped_extra == FLUSH_TOKEN.strip(): 654 line_stripped = '' 655 line = '' 656 657 if strip_timestamps: 658 line = line_stripped 659 660 try: 661 if asyncio.iscoroutinefunction(callback_function): 662 await callback_function(line) 663 else: 664 callback_function(line) 665 emitted_text = True 666 except StopMonitoringLogs: 667 events['stop_exception'].set() 668 return 669 except Exception: 670 warn(f"Error in logs callback:\n{traceback.format_exc()}") 671 672 await emit_latest_lines() 673 674 tasks = ( 675 [check_job_status_task] 676 + ([check_blocking_on_input_task] if accept_input else []) 677 + [combine_events_task] 678 ) 679 try: 680 _ = asyncio.gather(*tasks, return_exceptions=True) 681 except asyncio.exceptions.CancelledError: 682 raise 683 except Exception: 684 warn(f"Failed to run async checks:\n{traceback.format_exc()}") 685 686 watchfiles = mrsm.attempt_import('watchfiles', lazy=False) 687 dir_path_to_monitor = ( 688 _logs_path 689 or (log.file_path.parent if log else None) 690 or paths.LOGS_RESOURCES_PATH 691 ) 692 async for changes in watchfiles.awatch( 693 dir_path_to_monitor, 694 stop_event=combined_event, 695 ): 696 for change in changes: 697 file_path_str = change[1] 698 file_path = pathlib.Path(file_path_str) 699 latest_subfile_path = log.get_latest_subfile_path() 700 if latest_subfile_path != file_path: 701 continue 702 703 await emit_latest_lines() 704 705 await emit_latest_lines()
Monitor the job's log files and await a callback on new lines.
Parameters
- callback_function (Callable[[str], None], default _default_stdout_callback):
The callback to execute as new data comes in.
Defaults to printing the output directly to
stdout. - input_callback_function (Optional[Callable[[], str]], default None):
If provided, execute this callback when the daemon is blocking on stdin.
Defaults to
sys.stdin.readline(). - stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
- stop_event (Optional[asyncio.Event], default None):
If provided, stop monitoring when this event is set.
You may instead raise
meerschaum.jobs.StopMonitoringLogsfrom withincallback_functionto stop monitoring. - stop_on_exit (bool, default False):
If
True, stop monitoring when the job stops. - strip_timestamps (bool, default False):
If
True, remove leading timestamps from lines. - accept_input (bool, default True):
If
True, accept input when the daemon blocks on stdin.
707 def is_blocking_on_stdin(self, debug: bool = False) -> bool: 708 """ 709 Return whether a job's daemon is blocking on stdin. 710 """ 711 if self.executor is not None: 712 return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug) 713 714 return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
Return whether a job's daemon is blocking on stdin.
716 def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]: 717 """ 718 Return the kwargs to the blocking `prompt()`, if available. 719 """ 720 if self.executor is not None: 721 return self.executor.get_job_prompt_kwargs(self.name, debug=debug) 722 723 if not self.daemon.prompt_kwargs_file_path.exists(): 724 return {} 725 726 try: 727 with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f: 728 prompt_kwargs = json.load(f) 729 730 return prompt_kwargs 731 732 except Exception: 733 import traceback 734 traceback.print_exc() 735 return {}
Return the kwargs to the blocking prompt(), if available.
737 def write_stdin(self, data): 738 """ 739 Write to a job's daemon's `stdin`. 740 """ 741 self.daemon.stdin_file.write(data)
Write to a job's daemon's stdin.
743 @property 744 def executor(self) -> Union[Executor, None]: 745 """ 746 If the job is remote, return the connector to the remote API instance. 747 """ 748 return ( 749 mrsm.get_connector(self.executor_keys) 750 if self.executor_keys != 'local' 751 else None 752 )
If the job is remote, return the connector to the remote API instance.
754 @property 755 def status(self) -> str: 756 """ 757 Return the running status of the job's daemon. 758 """ 759 if '_status_hook' in self.__dict__: 760 return self._status_hook() 761 762 if self.executor is not None: 763 return self.executor.get_job_status(self.name) 764 765 return self.daemon.status
Return the running status of the job's daemon.
767 @property 768 def pid(self) -> Union[int, None]: 769 """ 770 Return the PID of the job's dameon. 771 """ 772 if self.executor is not None: 773 return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None) 774 775 return self.daemon.pid
Return the PID of the job's dameon.
777 @property 778 def restart(self) -> bool: 779 """ 780 Return whether to restart a stopped job. 781 """ 782 if self.executor is not None: 783 return self.executor.get_job_metadata(self.name).get('restart', False) 784 785 return self.daemon.properties.get('restart', False)
Return whether to restart a stopped job.
787 @property 788 def result(self) -> SuccessTuple: 789 """ 790 Return the `SuccessTuple` when the job has terminated. 791 """ 792 if self.is_running(): 793 return True, f"{self} is running." 794 795 if '_result_hook' in self.__dict__: 796 return self._result_hook() 797 798 if self.executor is not None: 799 return ( 800 self.executor.get_job_metadata(self.name) 801 .get('result', (False, "No result available.")) 802 ) 803 804 _result = self.daemon.properties.get('result', None) 805 if _result is None: 806 from meerschaum.utils.daemon.Daemon import _results 807 return _results.get(self.daemon.daemon_id, (False, "No result available.")) 808 809 return tuple(_result)
Return the SuccessTuple when the job has terminated.
811 @property 812 def sysargs(self) -> List[str]: 813 """ 814 Return the sysargs to use for the Daemon. 815 """ 816 if self._sysargs: 817 return self._sysargs 818 819 if self.executor is not None: 820 return self.executor.get_job_metadata(self.name).get('sysargs', []) 821 822 target_args = self.daemon.target_args 823 if target_args is None: 824 return [] 825 self._sysargs = target_args[0] if len(target_args) > 0 else [] 826 return self._sysargs
Return the sysargs to use for the Daemon.
828 def get_daemon_properties(self) -> Dict[str, Any]: 829 """ 830 Return the `properties` dictionary for the job's daemon. 831 """ 832 remote_properties = ( 833 {} 834 if self.executor is None 835 else self.executor.get_job_properties(self.name) 836 ) 837 return { 838 **remote_properties, 839 **self._properties_patch 840 }
Return the properties dictionary for the job's daemon.
842 @property 843 def daemon(self) -> 'Daemon': 844 """ 845 Return the daemon which this job manages. 846 """ 847 from meerschaum.utils.daemon import Daemon 848 if self._daemon is not None and self.executor is None and self._sysargs: 849 return self._daemon 850 851 self._daemon = Daemon( 852 target=entry, 853 target_args=[self._sysargs], 854 target_kw={}, 855 daemon_id=self.name, 856 label=shlex.join(self._sysargs), 857 properties=self.get_daemon_properties(), 858 ) 859 if '_rotating_log' in self.__dict__: 860 self._daemon._rotating_log = self._rotating_log 861 862 if '_stdin_file' in self.__dict__: 863 self._daemon._stdin_file = self._stdin_file 864 self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path 865 866 return self._daemon
Return the daemon which this job manages.
868 @property 869 def began(self) -> Union[datetime, None]: 870 """ 871 The datetime when the job began running. 872 """ 873 if self.executor is not None: 874 began_str = self.executor.get_job_began(self.name) 875 if began_str is None: 876 return None 877 return ( 878 datetime.fromisoformat(began_str) 879 .astimezone(timezone.utc) 880 .replace(tzinfo=None) 881 ) 882 883 began_str = self.daemon.properties.get('process', {}).get('began', None) 884 if began_str is None: 885 return None 886 887 return datetime.fromisoformat(began_str)
The datetime when the job began running.
889 @property 890 def ended(self) -> Union[datetime, None]: 891 """ 892 The datetime when the job stopped running. 893 """ 894 if self.executor is not None: 895 ended_str = self.executor.get_job_ended(self.name) 896 if ended_str is None: 897 return None 898 return ( 899 datetime.fromisoformat(ended_str) 900 .astimezone(timezone.utc) 901 .replace(tzinfo=None) 902 ) 903 904 ended_str = self.daemon.properties.get('process', {}).get('ended', None) 905 if ended_str is None: 906 return None 907 908 return datetime.fromisoformat(ended_str)
The datetime when the job stopped running.
910 @property 911 def paused(self) -> Union[datetime, None]: 912 """ 913 The datetime when the job was suspended while running. 914 """ 915 if self.executor is not None: 916 paused_str = self.executor.get_job_paused(self.name) 917 if paused_str is None: 918 return None 919 return ( 920 datetime.fromisoformat(paused_str) 921 .astimezone(timezone.utc) 922 .replace(tzinfo=None) 923 ) 924 925 paused_str = self.daemon.properties.get('process', {}).get('paused', None) 926 if paused_str is None: 927 return None 928 929 return datetime.fromisoformat(paused_str)
The datetime when the job was suspended while running.
931 @property 932 def stop_time(self) -> Union[datetime, None]: 933 """ 934 Return the timestamp when the job was manually stopped. 935 """ 936 if self.executor is not None: 937 return self.executor.get_job_stop_time(self.name) 938 939 if not self.daemon.stop_path.exists(): 940 return None 941 942 stop_data = self.daemon._read_stop_file() 943 if not stop_data: 944 return None 945 946 stop_time_str = stop_data.get('stop_time', None) 947 if not stop_time_str: 948 warn(f"Could not read stop time for {self}.") 949 return None 950 951 return datetime.fromisoformat(stop_time_str)
Return the timestamp when the job was manually stopped.
964 def check_restart(self) -> SuccessTuple: 965 """ 966 If `restart` is `True` and the daemon is not running, 967 restart the job. 968 Do not restart if the job was manually stopped. 969 """ 970 if self.is_running(): 971 return True, f"{self} is running." 972 973 if not self.restart: 974 return True, f"{self} does not need to be restarted." 975 976 if self.stop_time is not None: 977 return True, f"{self} was manually stopped." 978 979 return self.start()
If restart is True and the daemon is not running,
restart the job.
Do not restart if the job was manually stopped.
981 @property 982 def label(self) -> str: 983 """ 984 Return the job's Daemon label (joined sysargs). 985 """ 986 from meerschaum._internal.arguments import compress_pipeline_sysargs 987 sysargs = compress_pipeline_sysargs(self.sysargs) 988 return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
Return the job's Daemon label (joined sysargs).
1017 @property 1018 def env(self) -> Dict[str, str]: 1019 """ 1020 Return the environment variables to set for the job's process. 1021 """ 1022 if '_env' in self.__dict__: 1023 return self.__dict__['_env'] 1024 1025 _env = self.daemon.properties.get('env', {}) 1026 default_env = { 1027 'PYTHONUNBUFFERED': '1', 1028 'LINES': str(get_config('jobs', 'terminal', 'lines')), 1029 'COLUMNS': str(get_config('jobs', 'terminal', 'columns')), 1030 STATIC_CONFIG['environment']['noninteractive']: 'true', 1031 } 1032 self._env = {**default_env, **_env} 1033 return self._env
Return the environment variables to set for the job's process.
1035 @property 1036 def delete_after_completion(self) -> bool: 1037 """ 1038 Return whether this job is configured to delete itself after completion. 1039 """ 1040 if '_delete_after_completion' in self.__dict__: 1041 return self.__dict__.get('_delete_after_completion', False) 1042 1043 self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False) 1044 return self._delete_after_completion
Return whether this job is configured to delete itself after completion.
10def pprint( 11 *args, 12 detect_password: bool = True, 13 nopretty: bool = False, 14 **kw 15) -> None: 16 """Pretty print an object according to the configured ANSI and UNICODE settings. 17 If detect_password is True (default), search and replace passwords with '*' characters. 18 Does not mutate objects. 19 """ 20 import copy 21 import json 22 from meerschaum.utils.packages import attempt_import, import_rich 23 from meerschaum.utils.formatting import ANSI, get_console, print_tuple 24 from meerschaum.utils.warnings import error 25 from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords 26 from collections import OrderedDict 27 28 if ( 29 len(args) == 1 30 and 31 isinstance(args[0], tuple) 32 and 33 len(args[0]) == 2 34 and 35 isinstance(args[0][0], bool) 36 and 37 isinstance(args[0][1], str) 38 ): 39 return print_tuple(args[0], **filter_keywords(print_tuple, **kw)) 40 41 modify = True 42 rich_pprint = None 43 if ANSI and not nopretty: 44 rich = import_rich() 45 if rich is not None: 46 rich_pretty = attempt_import('rich.pretty') 47 if rich_pretty is not None: 48 def _rich_pprint(*args, **kw): 49 _console = get_console() 50 _kw = filter_keywords(_console.print, **kw) 51 _console.print(*args, **_kw) 52 rich_pprint = _rich_pprint 53 elif not nopretty: 54 pprintpp = attempt_import('pprintpp', warn=False) 55 try: 56 _pprint = pprintpp.pprint 57 except Exception : 58 import pprint as _pprint_module 59 _pprint = _pprint_module.pprint 60 61 func = ( 62 _pprint if rich_pprint is None else rich_pprint 63 ) if not nopretty else print 64 65 try: 66 args_copy = copy.deepcopy(args) 67 except Exception: 68 args_copy = args 69 modify = False 70 71 _args = [] 72 for a in args: 73 c = a 74 ### convert OrderedDict into dict 75 if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict): 76 c = dict_from_od(copy.deepcopy(c)) 77 _args.append(c) 78 args = _args 79 80 _args = list(args) 81 if detect_password and modify: 82 _args = [] 83 for a in args: 84 c = a 85 if isinstance(c, dict): 86 c = replace_password(copy.deepcopy(c)) 87 if nopretty: 88 try: 89 c = json.dumps(c) 90 is_json = True 91 except Exception: 92 is_json = False 93 if not is_json: 94 try: 95 c = str(c) 96 except Exception: 97 pass 98 _args.append(c) 99 100 ### filter out unsupported keywords 101 func_kw = filter_keywords(func, **kw) if not nopretty else {} 102 error_msg = None 103 try: 104 func(*_args, **func_kw) 105 except Exception as e: 106 error_msg = e 107 if error_msg is not None: 108 error(error_msg)
Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.
1250def attempt_import( 1251 *names: str, 1252 lazy: bool = True, 1253 warn: bool = True, 1254 install: bool = True, 1255 venv: Optional[str] = 'mrsm', 1256 precheck: bool = True, 1257 split: bool = True, 1258 check_update: bool = False, 1259 check_pypi: bool = False, 1260 check_is_installed: bool = True, 1261 allow_outside_venv: bool = True, 1262 color: bool = True, 1263 debug: bool = False 1264) -> Any: 1265 """ 1266 Raise a warning if packages are not installed; otherwise import and return modules. 1267 If `lazy` is `True`, return lazy-imported modules. 1268 1269 Returns tuple of modules if multiple names are provided, else returns one module. 1270 1271 Parameters 1272 ---------- 1273 names: List[str] 1274 The packages to be imported. 1275 1276 lazy: bool, default True 1277 If `True`, lazily load packages. 1278 1279 warn: bool, default True 1280 If `True`, raise a warning if a package cannot be imported. 1281 1282 install: bool, default True 1283 If `True`, attempt to install a missing package into the designated virtual environment. 1284 If `check_update` is True, install updates if available. 1285 1286 venv: Optional[str], default 'mrsm' 1287 The virtual environment in which to search for packages and to install packages into. 1288 1289 precheck: bool, default True 1290 If `True`, attempt to find module before importing (necessary for checking if modules exist 1291 and retaining lazy imports), otherwise assume lazy is `False`. 1292 1293 split: bool, default True 1294 If `True`, split packages' names on `'.'`. 1295 1296 check_update: bool, default False 1297 If `True` and `install` is `True`, install updates if the required minimum version 1298 does not match. 1299 1300 check_pypi: bool, default False 1301 If `True` and `check_update` is `True`, check PyPI when determining whether 1302 an update is required. 1303 1304 check_is_installed: bool, default True 1305 If `True`, check if the package is contained in the virtual environment. 1306 1307 allow_outside_venv: bool, default True 1308 If `True`, search outside of the specified virtual environment 1309 if the package cannot be found. 1310 Setting to `False` will reinstall the package into a virtual environment, even if it 1311 is installed outside. 1312 1313 color: bool, default True 1314 If `False`, do not print ANSI colors. 1315 1316 Returns 1317 ------- 1318 The specified modules. If they're not available and `install` is `True`, it will first 1319 download them into a virtual environment and return the modules. 1320 1321 Examples 1322 -------- 1323 >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy') 1324 >>> pandas = attempt_import('pandas') 1325 1326 """ 1327 1328 import importlib.util 1329 1330 ### to prevent recursion, check if parent Meerschaum package is being imported 1331 if names == ('meerschaum',): 1332 return _import_module('meerschaum') 1333 1334 if venv == 'mrsm' and _import_hook_venv is not None: 1335 if debug: 1336 print(f"Import hook for virtual environment '{_import_hook_venv}' is active.") 1337 venv = _import_hook_venv 1338 1339 _warnings = _import_module('meerschaum.utils.warnings') 1340 warn_function = _warnings.warn 1341 1342 def do_import(_name: str, **kw) -> Union['ModuleType', None]: 1343 with Venv(venv=venv, debug=debug): 1344 ### determine the import method (lazy vs normal) 1345 from meerschaum.utils.misc import filter_keywords 1346 import_method = ( 1347 _import_module if not lazy 1348 else lazy_import 1349 ) 1350 try: 1351 mod = import_method(_name, **(filter_keywords(import_method, **kw))) 1352 except Exception as e: 1353 if warn: 1354 import traceback 1355 traceback.print_exception(type(e), e, e.__traceback__) 1356 warn_function( 1357 f"Failed to import module '{_name}'.\nException:\n{e}", 1358 ImportWarning, 1359 stacklevel = (5 if lazy else 4), 1360 color = False, 1361 ) 1362 mod = None 1363 return mod 1364 1365 modules = [] 1366 for name in names: 1367 ### Check if package is a declared dependency. 1368 root_name = name.split('.')[0] if split else name 1369 install_name = _import_to_install_name(root_name) 1370 1371 if install_name is None: 1372 install_name = root_name 1373 if warn and root_name != 'plugins': 1374 warn_function( 1375 f"Package '{root_name}' is not declared in meerschaum.utils.packages.", 1376 ImportWarning, 1377 stacklevel = 3, 1378 color = False 1379 ) 1380 1381 ### Determine if the package exists. 1382 if precheck is False: 1383 found_module = ( 1384 do_import( 1385 name, debug=debug, warn=False, venv=venv, color=color, 1386 check_update=False, check_pypi=False, split=split, 1387 ) is not None 1388 ) 1389 else: 1390 if check_is_installed: 1391 with _locks['_is_installed_first_check']: 1392 if not _is_installed_first_check.get(name, False): 1393 package_is_installed = is_installed( 1394 name, 1395 venv = venv, 1396 split = split, 1397 allow_outside_venv = allow_outside_venv, 1398 debug = debug, 1399 ) 1400 _is_installed_first_check[name] = package_is_installed 1401 else: 1402 package_is_installed = _is_installed_first_check[name] 1403 else: 1404 package_is_installed = _is_installed_first_check.get( 1405 name, 1406 venv_contains_package(name, venv=venv, split=split, debug=debug) 1407 ) 1408 found_module = package_is_installed 1409 1410 if not found_module: 1411 if install: 1412 if not pip_install( 1413 install_name, 1414 venv = venv, 1415 split = False, 1416 check_update = check_update, 1417 color = color, 1418 debug = debug 1419 ) and warn: 1420 warn_function( 1421 f"Failed to install '{install_name}'.", 1422 ImportWarning, 1423 stacklevel = 3, 1424 color = False, 1425 ) 1426 elif warn: 1427 ### Raise a warning if we can't find the package and install = False. 1428 warn_function( 1429 (f"\n\nMissing package '{name}' from virtual environment '{venv}'; " 1430 + "some features will not work correctly." 1431 + "\n\nSet install=True when calling attempt_import.\n"), 1432 ImportWarning, 1433 stacklevel = 3, 1434 color = False, 1435 ) 1436 1437 ### Do the import. Will be lazy if lazy=True. 1438 m = do_import( 1439 name, debug=debug, warn=warn, venv=venv, color=color, 1440 check_update=check_update, check_pypi=check_pypi, install=install, split=split, 1441 ) 1442 modules.append(m) 1443 1444 modules = tuple(modules) 1445 if len(modules) == 1: 1446 return modules[0] 1447 return modules
Raise a warning if packages are not installed; otherwise import and return modules.
If lazy is True, return lazy-imported modules.
Returns tuple of modules if multiple names are provided, else returns one module.
Parameters
- names (List[str]): The packages to be imported.
- lazy (bool, default True):
If
True, lazily load packages. - warn (bool, default True):
If
True, raise a warning if a package cannot be imported. - install (bool, default True):
If
True, attempt to install a missing package into the designated virtual environment. Ifcheck_updateis True, install updates if available. - venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
- precheck (bool, default True):
If
True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy isFalse. - split (bool, default True):
If
True, split packages' names on'.'. - check_update (bool, default False):
If
TrueandinstallisTrue, install updates if the required minimum version does not match. - check_pypi (bool, default False):
If
Trueandcheck_updateisTrue, check PyPI when determining whether an update is required. - check_is_installed (bool, default True):
If
True, check if the package is contained in the virtual environment. - allow_outside_venv (bool, default True):
If
True, search outside of the specified virtual environment if the package cannot be found. Setting toFalsewill reinstall the package into a virtual environment, even if it is installed outside. - color (bool, default True):
If
False, do not print ANSI colors.
Returns
- The specified modules. If they're not available and
installisTrue, it will first - download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
22class Connector(metaclass=abc.ABCMeta): 23 """ 24 The base connector class to hold connection attributes. 25 """ 26 27 IS_INSTANCE: bool = False 28 29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 ) 69 70 def _reset_attributes(self): 71 self.__dict__ = self._original_dict 72 73 def _set_attributes( 74 self, 75 *args, 76 inherit_default: bool = True, 77 **kw: Any 78 ): 79 from meerschaum._internal.static import STATIC_CONFIG 80 from meerschaum.utils.warnings import error 81 82 self._attributes = {} 83 84 default_label = STATIC_CONFIG['connectors']['default_label'] 85 86 ### NOTE: Support the legacy method of explicitly passing the type. 87 label = kw.get('label', None) 88 if label is None: 89 if len(args) == 2: 90 label = args[1] 91 elif len(args) == 0: 92 label = None 93 else: 94 label = args[0] 95 96 if label == 'default': 97 error( 98 f"Label cannot be 'default'. Did you mean '{default_label}'?", 99 InvalidAttributesError, 100 ) 101 self.__dict__['label'] = label 102 103 from meerschaum.config import get_config 104 conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors')) 105 connector_config = copy.deepcopy(get_config('system', 'connectors')) 106 107 ### inherit attributes from 'default' if exists 108 if inherit_default: 109 inherit_from = 'default' 110 if self.type in conn_configs and inherit_from in conn_configs[self.type]: 111 _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from]) 112 self._attributes.update(_inherit_dict) 113 114 ### load user config into self._attributes 115 if self.type in conn_configs and self.label in conn_configs[self.type]: 116 self._attributes.update(conn_configs[self.type][self.label] or {}) 117 118 ### load system config into self._sys_config 119 ### (deep copy so future Connectors don't inherit changes) 120 if self.type in connector_config: 121 self._sys_config = copy.deepcopy(connector_config[self.type]) 122 123 ### add additional arguments or override configuration 124 self._attributes.update(kw) 125 126 ### finally, update __dict__ with _attributes. 127 self.__dict__.update(self._attributes) 128 129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 ) 175 176 177 def __str__(self): 178 """ 179 When cast to a string, return type:label. 180 """ 181 return f"{self.type}:{self.label}" 182 183 def __repr__(self): 184 """ 185 Represent the connector as type:label. 186 """ 187 return str(self) 188 189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta 204 205 206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type 225 226 227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
The base connector class to hold connection attributes.
29 def __init__( 30 self, 31 type: Optional[str] = None, 32 label: Optional[str] = None, 33 **kw: Any 34 ): 35 """ 36 Set the given keyword arguments as attributes. 37 38 Parameters 39 ---------- 40 type: str 41 The `type` of the connector (e.g. `sql`, `api`, `plugin`). 42 43 label: str 44 The `label` for the connector. 45 46 47 Examples 48 -------- 49 Run `mrsm edit config` and to edit connectors in the YAML file: 50 51 ```yaml 52 meerschaum: 53 connections: 54 {type}: 55 {label}: 56 ### attributes go here 57 ``` 58 59 """ 60 self._original_dict = copy.deepcopy(self.__dict__) 61 self._set_attributes(type=type, label=label, **kw) 62 63 ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set. 64 self.verify_attributes( 65 ['uri'] 66 if 'uri' in self.__dict__ 67 else getattr(self, 'REQUIRED_ATTRIBUTES', None) 68 )
129 def verify_attributes( 130 self, 131 required_attributes: Optional[List[str]] = None, 132 debug: bool = False, 133 ) -> None: 134 """ 135 Ensure that the required attributes have been met. 136 137 The Connector base class checks the minimum requirements. 138 Child classes may enforce additional requirements. 139 140 Parameters 141 ---------- 142 required_attributes: Optional[List[str]], default None 143 Attributes to be verified. If `None`, default to `['label']`. 144 145 debug: bool, default False 146 Verbosity toggle. 147 148 Returns 149 ------- 150 Don't return anything. 151 152 Raises 153 ------ 154 An error if any of the required attributes are missing. 155 """ 156 from meerschaum.utils.warnings import error 157 from meerschaum.utils.misc import items_str 158 if required_attributes is None: 159 required_attributes = ['type', 'label'] 160 161 missing_attributes = set() 162 for a in required_attributes: 163 if a not in self.__dict__: 164 missing_attributes.add(a) 165 if len(missing_attributes) > 0: 166 error( 167 ( 168 f"Missing {items_str(list(missing_attributes))} " 169 + f"for connector '{self.type}:{self.label}'." 170 ), 171 InvalidAttributesError, 172 silent=True, 173 stack=False 174 )
Ensure that the required attributes have been met.
The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.
Parameters
- required_attributes (Optional[List[str]], default None):
Attributes to be verified. If
None, default to['label']. - debug (bool, default False): Verbosity toggle.
Returns
- Don't return anything.
Raises
- An error if any of the required attributes are missing.
189 @property 190 def meta(self) -> Dict[str, Any]: 191 """ 192 Return the keys needed to reconstruct this Connector. 193 """ 194 _meta = { 195 key: value 196 for key, value in self.__dict__.items() 197 if not str(key).startswith('_') 198 } 199 _meta.update({ 200 'type': self.type, 201 'label': self.label, 202 }) 203 return _meta
Return the keys needed to reconstruct this Connector.
206 @property 207 def type(self) -> str: 208 """ 209 Return the type for this connector. 210 """ 211 _type = self.__dict__.get('type', None) 212 if _type is None: 213 import re 214 is_executor = self.__class__.__name__.lower().endswith('executor') 215 suffix_regex = ( 216 r'connector$' 217 if not is_executor 218 else r'executor$' 219 ) 220 _type = re.sub(suffix_regex, '', self.__class__.__name__.lower()) 221 if not _type or _type.lower() == 'instance': 222 raise ValueError("No type could be determined for this connector.") 223 self.__dict__['type'] = _type 224 return _type
Return the type for this connector.
227 @property 228 def label(self) -> str: 229 """ 230 Return the label for this connector. 231 """ 232 _label = self.__dict__.get('label', None) 233 if _label is None: 234 from meerschaum._internal.static import STATIC_CONFIG 235 _label = STATIC_CONFIG['connectors']['default_label'] 236 self.__dict__['label'] = _label 237 return _label
Return the label for this connector.
18class InstanceConnector(Connector): 19 """ 20 Instance connectors define the interface for managing pipes and provide methods 21 for management of users, plugins, tokens, and other metadata built atop pipes. 22 """ 23 24 IS_INSTANCE: bool = True 25 IS_THREAD_SAFE: bool = False 26 27 from ._users import ( 28 get_users_pipe, 29 register_user, 30 get_user_id, 31 get_username, 32 get_users, 33 edit_user, 34 delete_user, 35 get_user_password_hash, 36 get_user_type, 37 get_user_attributes, 38 ) 39 40 from ._plugins import ( 41 get_plugins_pipe, 42 register_plugin, 43 get_plugin_user_id, 44 delete_plugin, 45 get_plugin_id, 46 get_plugin_version, 47 get_plugins, 48 get_plugin_user_id, 49 get_plugin_username, 50 get_plugin_attributes, 51 ) 52 53 from ._tokens import ( 54 get_tokens_pipe, 55 register_token, 56 edit_token, 57 invalidate_token, 58 delete_token, 59 get_token, 60 get_tokens, 61 get_token_model, 62 get_token_secret_hash, 63 token_exists, 64 get_token_scopes, 65 ) 66 67 from ._pipes import ( 68 register_pipe, 69 get_pipe_attributes, 70 get_pipe_id, 71 edit_pipe, 72 delete_pipe, 73 fetch_pipes_keys, 74 pipe_exists, 75 drop_pipe, 76 drop_pipe_indices, 77 sync_pipe, 78 create_pipe_indices, 79 clear_pipe, 80 get_pipe_data, 81 get_pipe_docs, 82 get_sync_time, 83 get_pipe_columns_types, 84 get_pipe_columns_indices, 85 get_pipe_size, 86 compress_pipe, 87 decompress_pipe, 88 vacuum_pipe, 89 analyze_pipe, 90 partition_pipe, 91 )
Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.
18def get_users_pipe(self) -> 'mrsm.Pipe': 19 """ 20 Return the pipe used for users registration. 21 """ 22 if '_users_pipe' in self.__dict__: 23 return self._users_pipe 24 25 cache_connector = self.__dict__.get('_cache_connector', None) 26 self._users_pipe = mrsm.Pipe( 27 'mrsm', 'users', 28 instance=self, 29 target='mrsm_users', 30 temporary=True, 31 cache=True, 32 cache_connector_keys=cache_connector, 33 static=True, 34 null_indices=False, 35 columns={ 36 'primary': 'user_id', 37 }, 38 dtypes={ 39 'user_id': 'uuid', 40 'username': 'string', 41 'password_hash': 'string', 42 'email': 'string', 43 'user_type': 'string', 44 'attributes': 'json', 45 }, 46 indices={ 47 'unique': 'username', 48 }, 49 ) 50 return self._users_pipe
Return the pipe used for users registration.
53def register_user( 54 self, 55 user: User, 56 debug: bool = False, 57 **kwargs: Any 58) -> mrsm.SuccessTuple: 59 """ 60 Register a new user to the users pipe. 61 """ 62 users_pipe = self.get_users_pipe() 63 user.user_id = uuid.uuid4() 64 sync_success, sync_msg = users_pipe.sync( 65 [{ 66 'user_id': user.user_id, 67 'username': user.username, 68 'email': user.email, 69 'password_hash': user.password_hash, 70 'user_type': user.type, 71 'attributes': user.attributes, 72 }], 73 check_existing=False, 74 debug=debug, 75 ) 76 if not sync_success: 77 return False, f"Failed to register user '{user.username}':\n{sync_msg}" 78 79 return True, "Success"
Register a new user to the users pipe.
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 83 """ 84 Return a user's ID from the username. 85 """ 86 users_pipe = self.get_users_pipe() 87 result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1) 88 if result_df is None or len(result_df) == 0: 89 return None 90 return result_df['user_id'][0]
Return a user's ID from the username.
93def get_username(self, user_id: Any, debug: bool = False) -> Any: 94 """ 95 Return the username from the given ID. 96 """ 97 users_pipe = self.get_users_pipe() 98 return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)
Return the username from the given ID.
101def get_users( 102 self, 103 debug: bool = False, 104 **kw: Any 105) -> List[str]: 106 """ 107 Get the registered usernames. 108 """ 109 users_pipe = self.get_users_pipe() 110 df = users_pipe.get_data() 111 if df is None: 112 return [] 113 114 return list(df['username'])
Get the registered usernames.
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 118 """ 119 Edit the attributes for an existing user. 120 """ 121 users_pipe = self.get_users_pipe() 122 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 123 124 doc = {'user_id': user_id} 125 if user.email != '': 126 doc['email'] = user.email 127 if user.password_hash != '': 128 doc['password_hash'] = user.password_hash 129 if user.type != '': 130 doc['user_type'] = user.type 131 if user.attributes: 132 doc['attributes'] = user.attributes 133 134 sync_success, sync_msg = users_pipe.sync([doc], debug=debug) 135 if not sync_success: 136 return False, f"Failed to edit user '{user.username}':\n{sync_msg}" 137 138 return True, "Success"
Edit the attributes for an existing user.
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple: 142 """ 143 Delete a user from the users table. 144 """ 145 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 146 users_pipe = self.get_users_pipe() 147 clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug) 148 if not clear_success: 149 return False, f"Failed to delete user '{user}':\n{clear_msg}" 150 return True, "Success"
Delete a user from the users table.
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]: 154 """ 155 Get a user's password hash from the users table. 156 """ 157 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 158 users_pipe = self.get_users_pipe() 159 result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug) 160 if result_df is None or len(result_df) == 0: 161 return None 162 163 return result_df['password_hash'][0]
Get a user's password hash from the users table.
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]: 167 """ 168 Get a user's type from the users table. 169 """ 170 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 171 users_pipe = self.get_users_pipe() 172 result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug) 173 if result_df is None or len(result_df) == 0: 174 return None 175 176 return result_df['user_type'][0]
Get a user's type from the users table.
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]: 180 """ 181 Get a user's attributes from the users table. 182 """ 183 user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug) 184 users_pipe = self.get_users_pipe() 185 result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug) 186 if result_df is None or len(result_df) == 0: 187 return None 188 189 return result_df['attributes'][0]
Get a user's attributes from the users table.
16def get_plugins_pipe(self) -> 'mrsm.Pipe': 17 """ 18 Return the internal pipe for syncing plugins metadata. 19 """ 20 if '_plugins_pipe' in self.__dict__: 21 return self._plugins_pipe 22 23 cache_connector = self.__dict__.get('_cache_connector', None) 24 users_pipe = self.get_users_pipe() 25 user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid') 26 27 self._plugins_pipe = mrsm.Pipe( 28 'mrsm', 'plugins', 29 instance=self, 30 target='mrsm_plugins', 31 temporary=True, 32 cache=True, 33 cache_connector_keys=cache_connector, 34 static=True, 35 null_indices=False, 36 columns={ 37 'primary': 'plugin_name', 38 'user_id': 'user_id', 39 }, 40 dtypes={ 41 'plugin_name': 'string', 42 'user_id': user_id_dtype, 43 'attributes': 'json', 44 'version': 'string', 45 }, 46 ) 47 return self._plugins_pipe
Return the internal pipe for syncing plugins metadata.
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 51 """ 52 Register a new plugin to the plugins table. 53 """ 54 plugins_pipe = self.get_plugins_pipe() 55 users_pipe = self.get_users_pipe() 56 user_id = self.get_plugin_user_id(plugin) 57 if user_id is not None: 58 username = self.get_username(user_id, debug=debug) 59 return False, f"{plugin} is already registered to '{username}'." 60 61 doc = { 62 'plugin_name': plugin.name, 63 'version': plugin.version, 64 'attributes': plugin.attributes, 65 'user_id': plugin.user_id, 66 } 67 68 sync_success, sync_msg = plugins_pipe.sync( 69 [doc], 70 check_existing=False, 71 debug=debug, 72 ) 73 if not sync_success: 74 return False, f"Failed to register {plugin}:\n{sync_msg}" 75 76 return True, "Success"
Register a new plugin to the plugins table.
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 80 """ 81 Return the user ID for plugin's owner. 82 """ 83 plugins_pipe = self.get_plugins_pipe() 84 return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)
Return the user ID for plugin's owner.
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple: 106 """ 107 Delete a plugin's registration. 108 """ 109 plugin_id = self.get_plugin_id(plugin, debug=debug) 110 if plugin_id is None: 111 return False, f"{plugin} is not registered." 112 113 plugins_pipe = self.get_plugins_pipe() 114 clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug) 115 if not clear_success: 116 return False, f"Failed to delete {plugin}:\n{clear_msg}" 117 return True, "Success"
Delete a plugin's registration.
97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 98 """ 99 Return a plugin's ID. 100 """ 101 user_id = self.get_plugin_user_id(plugin, debug=debug) 102 return plugin.name if user_id is not None else None
Return a plugin's ID.
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]: 121 """ 122 Return the version for a plugin. 123 """ 124 plugins_pipe = self.get_plugins_pipe() 125 return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)
Return the version for a plugin.
136def get_plugins( 137 self, 138 user_id: Optional[int] = None, 139 search_term: Optional[str] = None, 140 debug: bool = False, 141 **kw: Any 142) -> List[str]: 143 """ 144 Return a list of plugin names. 145 """ 146 plugins_pipe = self.get_plugins_pipe() 147 params = {} 148 if user_id: 149 params['user_id'] = user_id 150 151 df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug) 152 if df is None: 153 return [] 154 155 docs = df.to_dict(orient='records') 156 return [ 157 plugin_name 158 for doc in docs 159 if (plugin_name := doc['plugin_name']).startswith(search_term or '') 160 ]
Return a list of plugin names.
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]: 88 """ 89 Return the username for plugin's owner. 90 """ 91 user_id = self.get_plugin_user_id(plugin, debug=debug) 92 if user_id is None: 93 return None 94 return self.get_username(user_id, debug=debug)
Return the username for plugin's owner.
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]: 129 """ 130 Return the attributes for a plugin. 131 """ 132 plugins_pipe = self.get_plugins_pipe() 133 return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}
Return the attributes for a plugin.
22def get_tokens_pipe(self) -> mrsm.Pipe: 23 """ 24 Return the internal pipe for tokens management. 25 """ 26 if '_tokens_pipe' in self.__dict__: 27 return self._tokens_pipe 28 29 users_pipe = self.get_users_pipe() 30 user_id_dtype = ( 31 users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid') 32 ) 33 34 cache_connector = self.__dict__.get('_cache_connector', None) 35 36 self._tokens_pipe = mrsm.Pipe( 37 'mrsm', 'tokens', 38 instance=self, 39 target='mrsm_tokens', 40 temporary=True, 41 cache=True, 42 cache_connector_keys=cache_connector, 43 static=True, 44 autotime=True, 45 null_indices=False, 46 columns={ 47 'datetime': 'creation', 48 'primary': 'id', 49 }, 50 indices={ 51 'unique': 'label', 52 'user_id': 'user_id', 53 }, 54 dtypes={ 55 'id': 'uuid', 56 'creation': 'datetime', 57 'expiration': 'datetime', 58 'is_valid': 'bool', 59 'label': 'string', 60 'user_id': user_id_dtype, 61 'scopes': 'json', 62 'secret_hash': 'string', 63 }, 64 ) 65 return self._tokens_pipe
Return the internal pipe for tokens management.
68def register_token( 69 self, 70 token: Token, 71 debug: bool = False, 72) -> mrsm.SuccessTuple: 73 """ 74 Register the new token to the tokens table. 75 """ 76 token_id, token_secret = token.generate_credentials() 77 tokens_pipe = self.get_tokens_pipe() 78 user_id = self.get_user_id(token.user) if token.user is not None else None 79 if user_id is None: 80 return False, "Cannot register a token without a user." 81 82 doc = { 83 'id': token_id, 84 'user_id': user_id, 85 'creation': datetime.now(timezone.utc), 86 'expiration': token.expiration, 87 'label': token.label, 88 'is_valid': token.is_valid, 89 'scopes': list(token.scopes) if token.scopes else [], 90 'secret_hash': hash_password( 91 str(token_secret), 92 rounds=STATIC_CONFIG['tokens']['hash_rounds'] 93 ), 94 } 95 sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug) 96 if not sync_success: 97 return False, f"Failed to register token:\n{sync_msg}" 98 return True, "Success"
Register the new token to the tokens table.
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 102 """ 103 Persist the token's in-memory state to the tokens pipe. 104 """ 105 if not token.id: 106 return False, "Token ID is not set." 107 108 if not token.exists(debug=debug): 109 return False, f"Token {token.id} does not exist." 110 111 if not token.creation: 112 token_model = self.get_token_model(token.id) 113 token.creation = token_model.creation 114 115 tokens_pipe = self.get_tokens_pipe() 116 doc = { 117 'id': token.id, 118 'creation': token.creation, 119 'expiration': token.expiration, 120 'label': token.label, 121 'is_valid': token.is_valid, 122 'scopes': list(token.scopes) if token.scopes else [], 123 } 124 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 125 if not sync_success: 126 return False, f"Failed to edit token '{token.id}':\n{sync_msg}" 127 128 return True, "Success"
Persist the token's in-memory state to the tokens pipe.
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 132 """ 133 Set `is_valid` to `False` for the given token. 134 """ 135 if not token.id: 136 return False, "Token ID is not set." 137 138 if not token.exists(debug=debug): 139 return False, f"Token {token.id} does not exist." 140 141 if not token.creation: 142 token_model = self.get_token_model(token.id) 143 token.creation = token_model.creation 144 145 token.is_valid = False 146 tokens_pipe = self.get_tokens_pipe() 147 doc = { 148 'id': token.id, 149 'creation': token.creation, 150 'is_valid': False, 151 } 152 sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug) 153 if not sync_success: 154 return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}" 155 156 return True, "Success"
Set is_valid to False for the given token.
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple: 160 """ 161 Delete the given token from the tokens table. 162 """ 163 if not token.id: 164 return False, "Token ID is not set." 165 166 if not token.exists(debug=debug): 167 return False, f"Token {token.id} does not exist." 168 169 if not token.creation: 170 token_model = self.get_token_model(token.id) 171 token.creation = token_model.creation 172 173 token.is_valid = False 174 tokens_pipe = self.get_tokens_pipe() 175 clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug) 176 if not clear_success: 177 return False, f"Failed to delete token '{token.id}':\n{clear_msg}" 178 179 return True, "Success"
Delete the given token from the tokens table.
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]: 236 """ 237 Return the `Token` from its ID. 238 """ 239 from meerschaum.utils.misc import is_uuid 240 if isinstance(token_id, str): 241 if is_uuid(token_id): 242 token_id = uuid.UUID(token_id) 243 else: 244 raise ValueError("Invalid token ID.") 245 token_model = self.get_token_model(token_id) 246 if token_model is None: 247 return None 248 return Token(**dict(token_model))
Return the Token from its ID.
182def get_tokens( 183 self, 184 user: Optional[User] = None, 185 labels: Optional[List[str]] = None, 186 ids: Optional[List[uuid.UUID]] = None, 187 debug: bool = False, 188) -> List[Token]: 189 """ 190 Return a list of `Token` objects. 191 """ 192 tokens_pipe = self.get_tokens_pipe() 193 user_id = ( 194 self.get_user_id(user, debug=debug) 195 if user is not None 196 else None 197 ) 198 user_type = self.get_user_type(user, debug=debug) if user is not None else None 199 params = ( 200 { 201 'user_id': ( 202 user_id 203 if user_type != 'admin' 204 else [user_id, None] 205 ) 206 } 207 if user_id is not None 208 else {} 209 ) 210 if labels: 211 params['label'] = labels 212 if ids: 213 params['id'] = ids 214 215 if debug: 216 dprint(f"Getting tokens with {user_id=}, {params=}") 217 218 tokens_df = tokens_pipe.get_data(params=params, debug=debug) 219 if tokens_df is None: 220 return [] 221 222 if debug: 223 dprint(f"Retrieved tokens dataframe:\n{tokens_df}") 224 225 tokens_docs = tokens_df.to_dict(orient='records') 226 return [ 227 Token( 228 instance=self, 229 **token_doc 230 ) 231 for token_doc in reversed(tokens_docs) 232 ]
Return a list of Token objects.
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]': 252 """ 253 Return a token's model from the instance. 254 """ 255 from meerschaum.models import TokenModel 256 if isinstance(token_id, Token): 257 token_id = Token.id 258 if not token_id: 259 raise ValueError("Invalid token ID.") 260 tokens_pipe = self.get_tokens_pipe() 261 doc = tokens_pipe.get_doc( 262 params={'id': token_id}, 263 debug=debug, 264 ) 265 if doc is None: 266 return None 267 return TokenModel(**doc)
Return a token's model from the instance.
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]: 271 """ 272 Return the secret hash for a given token. 273 """ 274 if isinstance(token_id, Token): 275 token_id = token_id.id 276 if not token_id: 277 raise ValueError("Invalid token ID.") 278 tokens_pipe = self.get_tokens_pipe() 279 return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)
Return the secret hash for a given token.
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool: 309 """ 310 Return `True` if a token exists in the tokens pipe. 311 """ 312 if isinstance(token_id, Token): 313 token_id = token_id.id 314 if not token_id: 315 raise ValueError("Invalid token ID.") 316 317 tokens_pipe = self.get_tokens_pipe() 318 return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None
Return True if a token exists in the tokens pipe.
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]: 296 """ 297 Return the scopes for a token. 298 """ 299 if isinstance(token_id, Token): 300 token_id = token_id.id 301 if not token_id: 302 raise ValueError("Invalid token ID.") 303 304 tokens_pipe = self.get_tokens_pipe() 305 return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []
Return the scopes for a token.
17@abc.abstractmethod 18def register_pipe( 19 self, 20 pipe: mrsm.Pipe, 21 debug: bool = False, 22 **kwargs: Any 23) -> mrsm.SuccessTuple: 24 """ 25 Insert the pipe's attributes into the internal `pipes` table. 26 27 Parameters 28 ---------- 29 pipe: mrsm.Pipe 30 The pipe to be registered. 31 32 Returns 33 ------- 34 A `SuccessTuple` of the result. 35 """
Insert the pipe's attributes into the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe to be registered.
Returns
- A
SuccessTupleof the result.
37@abc.abstractmethod 38def get_pipe_attributes( 39 self, 40 pipe: mrsm.Pipe, 41 debug: bool = False, 42 **kwargs: Any 43) -> Dict[str, Any]: 44 """ 45 Return the pipe's document from the internal `pipes` table. 46 47 Parameters 48 ---------- 49 pipe: mrsm.Pipe 50 The pipe whose attributes should be retrieved. 51 52 Returns 53 ------- 54 The document that matches the keys of the pipe. 55 """
Return the pipe's document from the internal pipes table.
Parameters
- pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
- The document that matches the keys of the pipe.
57@abc.abstractmethod 58def get_pipe_id( 59 self, 60 pipe: mrsm.Pipe, 61 debug: bool = False, 62 **kwargs: Any 63) -> Union[str, int, None]: 64 """ 65 Return the `id` for the pipe if it exists. 66 67 Parameters 68 ---------- 69 pipe: mrsm.Pipe 70 The pipe whose `id` to fetch. 71 72 Returns 73 ------- 74 The `id` for the pipe's document or `None`. 75 """
Return the id for the pipe if it exists.
Parameters
- pipe (mrsm.Pipe):
The pipe whose
idto fetch.
Returns
- The
idfor the pipe's document orNone.
77def edit_pipe( 78 self, 79 pipe: mrsm.Pipe, 80 debug: bool = False, 81 **kwargs: Any 82) -> mrsm.SuccessTuple: 83 """ 84 Edit the attributes of the pipe. 85 86 Parameters 87 ---------- 88 pipe: mrsm.Pipe 89 The pipe whose in-memory parameters must be persisted. 90 91 Returns 92 ------- 93 A `SuccessTuple` indicating success. 94 """ 95 raise NotImplementedError
Edit the attributes of the pipe.
Parameters
- pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
- A
SuccessTupleindicating success.
97def delete_pipe( 98 self, 99 pipe: mrsm.Pipe, 100 debug: bool = False, 101 **kwargs: Any 102) -> mrsm.SuccessTuple: 103 """ 104 Delete a pipe's registration from the `pipes` collection. 105 106 Parameters 107 ---------- 108 pipe: mrsm.Pipe 109 The pipe to be deleted. 110 111 Returns 112 ------- 113 A `SuccessTuple` indicating success. 114 """ 115 raise NotImplementedError
Delete a pipe's registration from the pipes collection.
Parameters
- pipe (mrsm.Pipe): The pipe to be deleted.
Returns
- A
SuccessTupleindicating success.
117@abc.abstractmethod 118def fetch_pipes_keys( 119 self, 120 connector_keys: Optional[List[str]] = None, 121 metric_keys: Optional[List[str]] = None, 122 location_keys: Optional[List[str]] = None, 123 tags: Optional[List[str]] = None, 124 debug: bool = False, 125 **kwargs: Any 126) -> Union[ 127 List[Tuple[str, str, str]], 128 List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], 129 Dict[Union[int, str], Tuple[str, str, str]], 130 Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], 131]: 132 """ 133 Return registered pipes' keys according to the provided filters. 134 135 May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples. 136 When returning a dictionary, the key is the pipe's unique ID (int or str). 137 Tuples may be length 3 `(connector_keys, metric_key, location_key)` or length 4 138 with parameters or tags appended as the fourth element. 139 140 Parameters 141 ---------- 142 connector_keys: list[str] | None, default None 143 The keys passed via `-c`. 144 145 metric_keys: list[str] | None, default None 146 The keys passed via `-m`. 147 148 location_keys: list[str] | None, default None 149 The keys passed via `-l`. 150 151 tags: List[str] | None, default None 152 Tags passed via `--tags` which are stored under `parameters:tags`. 153 154 Returns 155 ------- 156 A list of tuples or a dictionary mapping pipe IDs to tuples. 157 You may return the string `"None"` for location keys in place of nulls. 158 159 Examples 160 -------- 161 >>> import meerschaum as mrsm 162 >>> conn = mrsm.get_connector('example:demo') 163 >>> 164 >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn) 165 >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn) 166 >>> pipe_a.register() 167 >>> pipe_b.register() 168 >>> 169 >>> conn.fetch_pipes_keys(['a', 'b']) 170 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 171 >>> conn.fetch_pipes_keys(metric_keys=['demo']) 172 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 173 >>> conn.fetch_pipes_keys(tags=['foo']) 174 [('a', 'demo', 'None')] 175 >>> conn.fetch_pipes_keys(location_keys=[None]) 176 [('a', 'demo', 'None'), ('b', 'demo', 'None')] 177 """
Return registered pipes' keys according to the provided filters.
May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples.
When returning a dictionary, the key is the pipe's unique ID (int or str).
Tuples may be length 3 (connector_keys, metric_key, location_key) or length 4
with parameters or tags appended as the fourth element.
Parameters
- connector_keys (list[str] | None, default None):
The keys passed via
-c. - metric_keys (list[str] | None, default None):
The keys passed via
-m. - location_keys (list[str] | None, default None):
The keys passed via
-l. - tags (List[str] | None, default None):
Tags passed via
--tagswhich are stored underparameters:tags.
Returns
- A list of tuples or a dictionary mapping pipe IDs to tuples.
- You may return the string
"None"for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
179@abc.abstractmethod 180def pipe_exists( 181 self, 182 pipe: mrsm.Pipe, 183 debug: bool = False, 184 **kwargs: Any 185) -> bool: 186 """ 187 Check whether a pipe's target table exists. 188 189 Parameters 190 ---------- 191 pipe: mrsm.Pipe 192 The pipe to check whether its table exists. 193 194 Returns 195 ------- 196 A `bool` indicating the table exists. 197 """
Check whether a pipe's target table exists.
Parameters
- pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
- A
boolindicating the table exists.
199@abc.abstractmethod 200def drop_pipe( 201 self, 202 pipe: mrsm.Pipe, 203 debug: bool = False, 204 **kwargs: Any 205) -> mrsm.SuccessTuple: 206 """ 207 Drop a pipe's collection if it exists. 208 209 Parameters 210 ---------- 211 pipe: mrsm.Pipe 212 The pipe to be dropped. 213 214 Returns 215 ------- 216 A `SuccessTuple` indicating success. 217 """ 218 raise NotImplementedError
Drop a pipe's collection if it exists.
Parameters
- pipe (mrsm.Pipe): The pipe to be dropped.
Returns
- A
SuccessTupleindicating success.
220def drop_pipe_indices( 221 self, 222 pipe: mrsm.Pipe, 223 debug: bool = False, 224 **kwargs: Any 225) -> mrsm.SuccessTuple: 226 """ 227 Drop a pipe's indices. 228 229 Parameters 230 ---------- 231 pipe: mrsm.Pipe 232 The pipe whose indices need to be dropped. 233 234 Returns 235 ------- 236 A `SuccessTuple` indicating success. 237 """ 238 return False, f"Cannot drop indices for instance connectors of type '{self.type}'."
Drop a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
- A
SuccessTupleindicating success.
240@abc.abstractmethod 241def sync_pipe( 242 self, 243 pipe: mrsm.Pipe, 244 df: 'pd.DataFrame' = None, 245 begin: Union[datetime, int, None] = None, 246 end: Union[datetime, int, None] = None, 247 chunksize: Optional[int] = -1, 248 check_existing: bool = True, 249 debug: bool = False, 250 **kwargs: Any 251) -> mrsm.SuccessTuple: 252 """ 253 Sync a pipe using a database connection. 254 255 Parameters 256 ---------- 257 pipe: mrsm.Pipe 258 The Meerschaum Pipe instance into which to sync the data. 259 260 df: Optional[pd.DataFrame] 261 An optional DataFrame or equivalent to sync into the pipe. 262 Defaults to `None`. 263 264 begin: Union[datetime, int, None], default None 265 Optionally specify the earliest datetime to search for data. 266 Defaults to `None`. 267 268 end: Union[datetime, int, None], default None 269 Optionally specify the latest datetime to search for data. 270 Defaults to `None`. 271 272 chunksize: Optional[int], default -1 273 Specify the number of rows to sync per chunk. 274 If `-1`, resort to system configuration (default is `900`). 275 A `chunksize` of `None` will sync all rows in one transaction. 276 Defaults to `-1`. 277 278 check_existing: bool, default True 279 If `True`, pull and diff with existing data from the pipe. Defaults to `True`. 280 281 debug: bool, default False 282 Verbosity toggle. Defaults to False. 283 284 Returns 285 ------- 286 A `SuccessTuple` of success (`bool`) and message (`str`). 287 """
Sync a pipe using a database connection.
Parameters
- pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
- df (Optional[pd.DataFrame]):
An optional DataFrame or equivalent to sync into the pipe.
Defaults to
None. - begin (Union[datetime, int, None], default None):
Optionally specify the earliest datetime to search for data.
Defaults to
None. - end (Union[datetime, int, None], default None):
Optionally specify the latest datetime to search for data.
Defaults to
None. - chunksize (Optional[int], default -1):
Specify the number of rows to sync per chunk.
If
-1, resort to system configuration (default is900). AchunksizeofNonewill sync all rows in one transaction. Defaults to-1. - check_existing (bool, default True):
If
True, pull and diff with existing data from the pipe. Defaults toTrue. - debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
- A
SuccessTupleof success (bool) and message (str).
289def create_pipe_indices( 290 self, 291 pipe: mrsm.Pipe, 292 debug: bool = False, 293 **kwargs: Any 294) -> mrsm.SuccessTuple: 295 """ 296 Create a pipe's indices. 297 298 Parameters 299 ---------- 300 pipe: mrsm.Pipe 301 The pipe whose indices need to be created. 302 303 Returns 304 ------- 305 A `SuccessTuple` indicating success. 306 """ 307 return False, f"Cannot create indices for instance connectors of type '{self.type}'."
Create a pipe's indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
- A
SuccessTupleindicating success.
309def clear_pipe( 310 self, 311 pipe: mrsm.Pipe, 312 begin: Union[datetime, int, None] = None, 313 end: Union[datetime, int, None] = None, 314 params: Optional[Dict[str, Any]] = None, 315 debug: bool = False, 316 **kwargs: Any 317) -> mrsm.SuccessTuple: 318 """ 319 Delete rows within `begin`, `end`, and `params`. 320 321 Parameters 322 ---------- 323 pipe: mrsm.Pipe 324 The pipe whose rows to clear. 325 326 begin: datetime | int | None, default None 327 If provided, remove rows >= `begin`. 328 329 end: datetime | int | None, default None 330 If provided, remove rows < `end`. 331 332 params: dict[str, Any] | None, default None 333 If provided, only remove rows which match the `params` filter. 334 335 Returns 336 ------- 337 A `SuccessTuple` indicating success. 338 """ 339 raise NotImplementedError
Delete rows within begin, end, and params.
Parameters
- pipe (mrsm.Pipe): The pipe whose rows to clear.
- begin (datetime | int | None, default None):
If provided, remove rows >=
begin. - end (datetime | int | None, default None):
If provided, remove rows <
end. - params (dict[str, Any] | None, default None):
If provided, only remove rows which match the
paramsfilter.
Returns
- A
SuccessTupleindicating success.
341def get_pipe_data( 342 self, 343 pipe: mrsm.Pipe, 344 select_columns: Optional[List[str]] = None, 345 omit_columns: Optional[List[str]] = None, 346 begin: Union[datetime, int, None] = None, 347 end: Union[datetime, int, None] = None, 348 params: Optional[Dict[str, Any]] = None, 349 debug: bool = False, 350 **kwargs: Any 351) -> Union['pd.DataFrame', None]: 352 """ 353 Query a pipe's target table and return the DataFrame. 354 355 Parameters 356 ---------- 357 pipe: mrsm.Pipe 358 The pipe with the target table from which to read. 359 360 select_columns: list[str] | None, default None 361 If provided, only select these given columns. 362 Otherwise select all available columns (i.e. `SELECT *`). 363 364 omit_columns: list[str] | None, default None 365 If provided, remove these columns from the selection. 366 367 begin: datetime | int | None, default None 368 The earliest `datetime` value to search from (inclusive). 369 370 end: datetime | int | None, default None 371 The lastest `datetime` value to search from (exclusive). 372 373 params: dict[str | str] | None, default None 374 Additional filters to apply to the query. 375 376 Returns 377 ------- 378 The target table's data as a DataFrame. 379 """ 380 if type(self).get_pipe_docs is get_pipe_docs: 381 raise NotImplementedError( 382 f"Missing `get_pipe_data()` or `get_pipe_docs()` for {type(self)}." 383 ) 384 385 docs = self.get_pipe_docs( 386 pipe=pipe, 387 select_columns=select_columns, 388 omit_columns=omit_columns, 389 begin=begin, 390 end=end, 391 params=params, 392 debug=debug, 393 **kwargs 394 ) 395 if not docs: 396 return None 397 398 pd = mrsm.attempt_import('pandas') 399 try: 400 return pd.DataFrame(docs) 401 except Exception as e: 402 from meerschaum.utils.warnings import warn 403 warn(f"Cannot build DataFrame from pipe docs:\n{e}") 404 405 return None
Query a pipe's target table and return the DataFrame.
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetimevalue to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetimevalue to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a DataFrame.
407def get_pipe_docs( 408 self, 409 pipe: mrsm.Pipe, 410 select_columns: Optional[List[str]] = None, 411 omit_columns: Optional[List[str]] = None, 412 begin: Union[datetime, int, None] = None, 413 end: Union[datetime, int, None] = None, 414 params: Optional[Dict[str, Any]] = None, 415 debug: bool = False, 416 **kwargs: Any 417) -> list[dict[str, Any]]: 418 """ 419 Return a pipe's data as a list of documents. 420 Defaults to `get_pipe_data().to_dict(orient='records')`. 421 422 Parameters 423 ---------- 424 pipe: mrsm.Pipe 425 The pipe with the target table from which to read. 426 427 select_columns: list[str] | None, default None 428 If provided, only select these given columns. 429 Otherwise select all available columns (i.e. `SELECT *`). 430 431 omit_columns: list[str] | None, default None 432 If provided, remove these columns from the selection. 433 434 begin: datetime | int | None, default None 435 The earliest `datetime` value to search from (inclusive). 436 437 end: datetime | int | None, default None 438 The lastest `datetime` value to search from (exclusive). 439 440 params: dict[str | str] | None, default None 441 Additional filters to apply to the query. 442 443 Returns 444 ------- 445 The target table's data as a list of dictionaries. 446 """ 447 df = self.get_pipe_data( 448 pipe=pipe, 449 select_columns=select_columns, 450 omit_columns=omit_columns, 451 begin=begin, 452 end=end, 453 params=params, 454 debug=debug, 455 **kwargs 456 ) 457 if df is None or df.empty: 458 return [] 459 return df.to_dict(orient='records')
Return a pipe's data as a list of documents.
Defaults to get_pipe_data().to_dict(orient='records').
Parameters
- pipe (mrsm.Pipe): The pipe with the target table from which to read.
- select_columns (list[str] | None, default None):
If provided, only select these given columns.
Otherwise select all available columns (i.e.
SELECT *). - omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
- begin (datetime | int | None, default None):
The earliest
datetimevalue to search from (inclusive). - end (datetime | int | None, default None):
The lastest
datetimevalue to search from (exclusive). - params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
- The target table's data as a list of dictionaries.
461@abc.abstractmethod 462def get_sync_time( 463 self, 464 pipe: mrsm.Pipe, 465 params: Optional[Dict[str, Any]] = None, 466 newest: bool = True, 467 debug: bool = False, 468 **kwargs: Any 469) -> datetime | int | None: 470 """ 471 Return the most recent value for the `datetime` axis. 472 473 Parameters 474 ---------- 475 pipe: mrsm.Pipe 476 The pipe whose collection contains documents. 477 478 params: dict[str, Any] | None, default None 479 Filter certain parameters when determining the sync time. 480 481 newest: bool, default True 482 If `True`, return the maximum value for the column. 483 484 Returns 485 ------- 486 The largest `datetime` or `int` value of the `datetime` axis. 487 """
Return the most recent value for the datetime axis.
Parameters
- pipe (mrsm.Pipe): The pipe whose collection contains documents.
- params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
- newest (bool, default True):
If
True, return the maximum value for the column.
Returns
- The largest
datetimeorintvalue of thedatetimeaxis.
489@abc.abstractmethod 490def get_pipe_columns_types( 491 self, 492 pipe: mrsm.Pipe, 493 debug: bool = False, 494 **kwargs: Any 495) -> Dict[str, str]: 496 """ 497 Return the data types for the columns in the target table for data type enforcement. 498 499 Parameters 500 ---------- 501 pipe: mrsm.Pipe 502 The pipe whose target table contains columns and data types. 503 504 Returns 505 ------- 506 A dictionary mapping columns to data types. 507 """
Return the data types for the columns in the target table for data type enforcement.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
- A dictionary mapping columns to data types.
509def get_pipe_columns_indices( 510 self, 511 debug: bool = False, 512) -> Dict[str, List[Dict[str, str]]]: 513 """ 514 Return a dictionary mapping columns to metadata about related indices. 515 516 Parameters 517 ---------- 518 pipe: mrsm.Pipe 519 The pipe whose target table has related indices. 520 521 Returns 522 ------- 523 A list of dictionaries with the keys "type" and "name". 524 525 Examples 526 -------- 527 >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']}) 528 >>> pipe.sync([{'color': 'red', 'size': 'M'}]) 529 >>> pipe.get_columns_indices() 530 {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]} 531 """ 532 return {}
Return a dictionary mapping columns to metadata about related indices.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
- A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
534def get_pipe_size( 535 self, 536 pipe: mrsm.Pipe, 537 debug: bool = False, 538 **kwargs: Any 539) -> Union[int, None]: 540 """ 541 Return the on-disk size of a pipe's target table in bytes. 542 543 Parameters 544 ---------- 545 pipe: mrsm.Pipe 546 The pipe whose target table size to measure. 547 548 Returns 549 ------- 550 An `int` of the number of bytes occupied by the target table, 551 or `None` if the size cannot be determined. 552 """ 553 raise NotImplementedError( 554 f"`get_pipe_size()` is not implemented for instance connectors of type '{self.type}'." 555 )
Return the on-disk size of a pipe's target table in bytes.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table size to measure.
Returns
- An
intof the number of bytes occupied by the target table, - or
Noneif the size cannot be determined.
557def compress_pipe( 558 self, 559 pipe: mrsm.Pipe, 560 debug: bool = False, 561 **kwargs: Any 562) -> mrsm.SuccessTuple: 563 """ 564 Compress a pipe's target table to reduce disk usage. 565 566 Parameters 567 ---------- 568 pipe: mrsm.Pipe 569 The pipe whose target table to compress. 570 571 Returns 572 ------- 573 A `SuccessTuple` indicating success. 574 """ 575 return False, ( 576 f"Compression is not supported for instance connectors of type '{self.type}'." 577 )
Compress a pipe's target table to reduce disk usage.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table to compress.
Returns
- A
SuccessTupleindicating success.
579def decompress_pipe( 580 self, 581 pipe: mrsm.Pipe, 582 debug: bool = False, 583 **kwargs: Any 584) -> mrsm.SuccessTuple: 585 """ 586 Decompress a pipe's target table, the inverse of `compress_pipe()`. 587 588 Parameters 589 ---------- 590 pipe: mrsm.Pipe 591 The pipe whose target table to decompress. 592 593 Returns 594 ------- 595 A `SuccessTuple` indicating success. 596 """ 597 return False, ( 598 f"Decompression is not supported for instance connectors of type '{self.type}'." 599 )
Decompress a pipe's target table, the inverse of compress_pipe().
Parameters
- pipe (mrsm.Pipe): The pipe whose target table to decompress.
Returns
- A
SuccessTupleindicating success.
601def vacuum_pipe( 602 self, 603 pipe: mrsm.Pipe, 604 debug: bool = False, 605 **kwargs: Any 606) -> mrsm.SuccessTuple: 607 """ 608 Reclaim disk space from a pipe's target table. 609 610 Parameters 611 ---------- 612 pipe: mrsm.Pipe 613 The pipe whose target table to vacuum. 614 615 Returns 616 ------- 617 A `SuccessTuple` indicating success. 618 """ 619 return False, ( 620 f"Vacuuming is not supported for instance connectors of type '{self.type}'." 621 )
Reclaim disk space from a pipe's target table.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table to vacuum.
Returns
- A
SuccessTupleindicating success.
623def analyze_pipe( 624 self, 625 pipe: mrsm.Pipe, 626 debug: bool = False, 627 **kwargs: Any 628) -> mrsm.SuccessTuple: 629 """ 630 Refresh the planner statistics for a pipe's target table. 631 632 Parameters 633 ---------- 634 pipe: mrsm.Pipe 635 The pipe whose target table to analyze. 636 637 Returns 638 ------- 639 A `SuccessTuple` indicating success. 640 """ 641 return False, ( 642 f"Analyzing is not supported for instance connectors of type '{self.type}'." 643 )
Refresh the planner statistics for a pipe's target table.
Parameters
- pipe (mrsm.Pipe): The pipe whose target table to analyze.
Returns
- A
SuccessTupleindicating success.
645def partition_pipe( 646 self, 647 pipe: mrsm.Pipe, 648 chunk_minutes: Optional[int] = None, 649 debug: bool = False, 650 **kwargs: Any 651) -> mrsm.SuccessTuple: 652 """ 653 Rebuild a pipe's target table to a new partition (chunk) width. 654 655 Parameters 656 ---------- 657 pipe: mrsm.Pipe 658 The partitioned pipe whose target table to repartition. 659 660 chunk_minutes: Optional[int], default None 661 The new partition width in minutes. Defaults to the pipe's `verify.chunk_minutes`. 662 663 Returns 664 ------- 665 A `SuccessTuple` indicating success. 666 """ 667 return False, ( 668 f"Repartitioning is not supported for instance connectors of type '{self.type}'." 669 )
Rebuild a pipe's target table to a new partition (chunk) width.
Parameters
- pipe (mrsm.Pipe): The partitioned pipe whose target table to repartition.
- chunk_minutes (Optional[int], default None):
The new partition width in minutes. Defaults to the pipe's
verify.chunk_minutes.
Returns
- A
SuccessTupleindicating success.
279def make_connector(cls, _is_executor: bool = False): 280 """ 281 Register a class as a `Connector`. 282 The `type` will be the lower case of the class name, without the suffix `connector`. 283 284 Parameters 285 ---------- 286 instance: bool, default False 287 If `True`, make this connector type an instance connector. 288 This requires implementing the various pipes functions and lots of testing. 289 290 Examples 291 -------- 292 >>> import meerschaum as mrsm 293 >>> from meerschaum.connectors import make_connector, Connector 294 >>> 295 >>> @make_connector 296 >>> class FooConnector(Connector): 297 ... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password'] 298 ... 299 >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat') 300 >>> print(conn.username, conn.password) 301 dog cat 302 >>> 303 """ 304 import re 305 from meerschaum.plugins import _get_parent_plugin 306 suffix_regex = ( 307 r'connector$' 308 if not _is_executor 309 else r'executor$' 310 ) 311 plugin_name = _get_parent_plugin(2) 312 typ = re.sub(suffix_regex, '', cls.__name__.lower()) 313 with _locks['types']: 314 types[typ] = cls 315 with _locks['custom_types']: 316 custom_types.add(typ) 317 if plugin_name: 318 with _locks['plugins_types']: 319 if plugin_name not in plugins_types: 320 plugins_types[plugin_name] = [] 321 plugins_types[plugin_name].append(typ) 322 with _locks['connectors']: 323 if typ not in connectors: 324 connectors[typ] = {} 325 if getattr(cls, 'IS_INSTANCE', False): 326 with _locks['instance_types']: 327 if typ not in instance_types: 328 instance_types.append(typ) 329 330 return cls
Register a class as a Connector.
The type will be the lower case of the class name, without the suffix connector.
Parameters
- instance (bool, default False):
If
True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>>
>>> @make_connector
>>> class FooConnector(Connector):
... REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
...
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
53def entry( 54 sysargs: Union[List[str], str, None] = None, 55 _patch_args: Optional[Dict[str, Any]] = None, 56 _use_cli_daemon: bool = True, 57 _session_id: Optional[str] = None, 58) -> SuccessTuple: 59 """ 60 Parse arguments and launch a Meerschaum action. 61 62 Returns 63 ------- 64 A `SuccessTuple` indicating success. 65 """ 66 start = time.perf_counter() 67 from meerschaum.config.environment import get_daemon_env_vars 68 sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs 69 if ( 70 not _use_cli_daemon 71 or (not sysargs or (sysargs[0] and sysargs[0].startswith('-'))) 72 or '--no-daemon' in sysargs_list 73 or '--daemon' in sysargs_list 74 or '-d' in sysargs_list 75 or get_daemon_env_vars() 76 or not mrsm.get_config('system', 'experimental', 'cli_daemon') 77 ): 78 success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args) 79 end = time.perf_counter() 80 if '--debug' in sysargs_list: 81 print(f"Duration without daemon: {round(end - start, 3)}") 82 return success, msg 83 84 from meerschaum._internal.cli.entry import entry_with_daemon 85 success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args) 86 end = time.perf_counter() 87 if '--debug' in sysargs_list: 88 print(f"Duration with daemon: {round(end - start, 3)}") 89 return success, msg