meerschaum

Meerschaum banner

Meerschaum Python API

Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.

Root Module

For your convenience, the following classes and functions may be imported from the root meerschaum namespace:

Examples

Build a Connector

Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:

import meerschaum as mrsm

sql_conn = mrsm.get_connector(
    'sql:temp',
    flavor='sqlite',
    database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
#    foo
# 0    1

sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
#    foo
# 0    1

Create a Custom Connector Class

Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:

from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time

@mrsm.make_connector
class FooConnector(mrsm.Connector):
    REQUIRED_ATTRIBUTES = ['username', 'password']

    def fetch(
        self,
        begin: datetime | None = None,
        end: datetime | None = None,
    ):
        now = begin or round_time(datetime.now(timezone.utc))
        return [
            {'ts': now, 'id': 1, 'vl': randint(1, 100)},
            {'ts': now, 'id': 2, 'vl': randint(1, 100)},
            {'ts': now, 'id': 3, 'vl': randint(1, 100)},
        ]

foo_conn = mrsm.get_connector(
    'foo:bar',
    username='foo',
    password='bar',
)
docs = foo_conn.fetch()

Build a Pipe

Build a meerschaum.Pipe in-memory:

from datetime import datetime
import meerschaum as mrsm

pipe = mrsm.Pipe(
    foo_conn, 'demo',
    instance=sql_conn,
    columns={'datetime': 'ts', 'id': 'id'},
    tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
#           ts  id  vl
# 0 2024-01-01   1  97
# 1 2024-01-01   2  18
# 2 2024-01-01   3  96

Add temporary=True to skip registering the pipe in the pipes table.

Get Registered Pipes

The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:

import meerschaum as mrsm

pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]

Add as_list=True to flatten the hierarchy:

import meerschaum as mrsm

pipes = mrsm.get_pipes(
    tags=['production'],
    instance=sql_conn,
    as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]

Filter by the dtype of the datetime index column with datetime_dtypes. Accepted values are 'datetime', 'int', and 'None'; prefix with '_' to negate:

import meerschaum as mrsm

### Only pipes with a timestamp datetime index:
timestamp_pipes = mrsm.get_pipes(datetime_dtypes=['datetime'], as_list=True)

### Only pipes with an integer datetime index:
int_pipes = mrsm.get_pipes(datetime_dtypes=['int'], as_list=True)

### Exclude pipes without a datetime index:
datetime_pipes = mrsm.get_pipes(datetime_dtypes=['_None'], as_list=True)

Import Plugins

You can import a plugin's module through meerschaum.Plugin.module:

import meerschaum as mrsm

plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
    noaa = plugin.module

If your plugin has submodules, use meerschaum.plugins.from_plugin_import:

from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')

Import multiple plugins with meerschaum.plugins.import_plugins:

from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')

Create a Job

Create a meerschaum.Job with name and sysargs:

import meerschaum as mrsm

job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()

Pass executor_keys as the connectors keys of an API instance to create a remote job:

import meerschaum as mrsm

job = mrsm.Job(
    'foo',
    'sync pipes -s daily',
    executor_keys='api:main',
)

Import from a Virtual Environment Use the meerschaum.Venv context manager to activate a virtual environment:

import meerschaum as mrsm

with mrsm.Venv('noaa'):
    import requests

print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

To import packages which may not be installed, use meerschaum.attempt_import():

import meerschaum as mrsm

requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

Run Actions

Run sysargs with meerschaum.entry():

import meerschaum as mrsm

success, msg = mrsm.entry('show pipes + show version : x2')

Use meerschaum.actions.get_action() to access an action function directly:

from meerschaum.actions import get_action

show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])

Get a dictionary of available subactions with meerschaum.actions.get_subactions():

from meerschaum.actions import get_subactions

subactions = get_subactions('show')
success, msg = subactions['pipes']()

Create a Plugin

Run bootstrap plugin to create a new plugin:

mrsm bootstrap plugin example

This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.

Open your plugin with edit plugin:

mrsm edit plugin example

Run edit plugin and paste the example code below to try out the features.

See the writing plugins guide for more in-depth documentation.

Create a Custom Action

Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:

from meerschaum.actions import make_action

@make_action
def sing():
    print('What would you like me to sing?')
    return True, "Success"

def sing_tune():
    return False, "I don't know that song!"

def sing_song():
    print('Hello, World!')
    return True, "Success"

Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:

from meerschaum.plugins import make_action, add_plugin_argument

add_plugin_argument(
    '--song', type=str, help='What song to sing.',
)

@make_action
def sing_melody(action=None, song=None):
    to_sing = action[0] if action else song
    if not to_sing:
        return False, "Please tell me what to sing!"

    return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala

mrsm sing melody --song do-re-mi

Add a Page to the Web Dashboard Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:

from meerschaum.plugins import dash_plugin, web_page

@dash_plugin
def init_dash(dash_app):

    import dash.html as html
    import dash_bootstrap_components as dbc
    from dash import Input, Output, no_update

    ### Routes to '/dash/my-page'
    @web_page('/my-page', login_required=False)
    def my_page():
        return dbc.Container([
            html.H1("Hello, World!"),
            dbc.Button("Click me", id='my-button'),
            html.Div(id="my-output-div"),
        ])

    @dash_app.callback(
        Output('my-output-div', 'children'),
        Input('my-button', 'n_clicks'),
    )
    def my_button_click(n_clicks):
        if not n_clicks:
            return no_update
        return html.P(f'You clicked {n_clicks} times!')

Submodules

meerschaum.actions
Access functions for actions and subactions.

meerschaum.config
Read and write the Meerschaum configuration registry.

meerschaum.connectors
Build connectors to interact with databases and fetch data.

meerschaum.jobs
Start background jobs.

meerschaum.plugins
Access plugin modules and other API utilties.

meerschaum.utils
Utility functions are available in several submodules:

 1#! /usr/bin/env python
 2# -*- coding: utf-8 -*-
 3# vim:fenc=utf-8
 4
 5"""
 6Copyright 2020–2026 Bennett Meares
 7
 8Licensed under the Apache License, Version 2.0 (the "License");
 9you may not use this file except in compliance with the License.
10You may obtain a copy of the License at
11
12   http://www.apache.org/licenses/LICENSE-2.0
13
14Unless required by applicable law or agreed to in writing, software
15distributed under the License is distributed on an "AS IS" BASIS,
16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17See the License for the specific language governing permissions and
18limitations under the License.
19"""
20
21import atexit
22
23from meerschaum.utils.typing import SuccessTuple
24from meerschaum.utils.packages import attempt_import
25from meerschaum.core.Pipe import Pipe
26from meerschaum.plugins import Plugin
27from meerschaum.utils.venv import Venv
28from meerschaum.jobs import Job, make_executor
29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector
30from meerschaum.utils import get_pipes
31from meerschaum.utils.formatting import pprint
32from meerschaum._internal.docs import index as __doc__
33from meerschaum.config import __version__, get_config
34from meerschaum._internal.entry import entry
35from meerschaum.__main__ import _close_pools
36
37atexit.register(_close_pools)
38
39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False}
40__all__ = (
41    "get_pipes",
42    "get_connector",
43    "get_config",
44    "Pipe",
45    "Plugin",
46    "SuccessTuple",
47    "Venv",
48    "Plugin",
49    "Job",
50    "pprint",
51    "attempt_import",
52    "actions",
53    "config",
54    "connectors",
55    "jobs",
56    "plugins",
57    "utils",
58    "SuccessTuple",
59    "Connector",
60    "InstanceConnector",
61    "make_connector",
62    "entry",
63)
def get_pipes( connector_keys: Union[str, List[str], NoneType] = None, metric_keys: Union[str, List[str], NoneType] = None, location_keys: Union[str, List[str], NoneType] = None, tags: Optional[List[str]] = None, datetime_dtypes: Optional[List[str]] = None, params: Optional[Dict[str, Any]] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, instance: Union[str, InstanceConnector, NoneType] = None, as_list: bool = False, as_tags_dict: bool = False, method: str = 'registered', workers: Optional[int] = None, debug: bool = False, _cache_parameters: bool = True, **kw: Any) -> Union[Dict[str, Dict[str, Dict[Optional[str], Pipe]]], List[Pipe], Dict[str, Pipe]]:
 29def get_pipes(
 30    connector_keys: Union[str, List[str], None] = None,
 31    metric_keys: Union[str, List[str], None] = None,
 32    location_keys: Union[str, List[str], None] = None,
 33    tags: Optional[List[str]] = None,
 34    datetime_dtypes: Optional[List[str]] = None,
 35    params: Optional[Dict[str, Any]] = None,
 36    mrsm_instance: Union[str, InstanceConnector, None] = None,
 37    instance: Union[str, InstanceConnector, None] = None,
 38    as_list: bool = False,
 39    as_tags_dict: bool = False,
 40    method: str = 'registered',
 41    workers: Optional[int] = None,
 42    debug: bool = False,
 43    _cache_parameters: bool = True,
 44    **kw: Any
 45) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]:
 46    """
 47    Return a dictionary or list of `meerschaum.Pipe` objects.
 48
 49    Parameters
 50    ----------
 51    connector_keys: Union[str, List[str], None], default None
 52        String or list of connector keys.
 53        If omitted or is `'*'`, fetch all possible keys.
 54        If a string begins with `'_'`, select keys that do NOT match the string.
 55
 56    metric_keys: Union[str, List[str], None], default None
 57        String or list of metric keys. See `connector_keys` for formatting.
 58
 59    location_keys: Union[str, List[str], None], default None
 60        String or list of location keys. See `connector_keys` for formatting.
 61
 62    tags: Optional[List[str]], default None
 63        If provided, only include pipes with these tags.
 64
 65    datetime_dtypes: Optional[List[str]], default None
 66        If provided, only include pipes with the corresponding `datetime` axis dtypes.
 67        Accepted values are `datetime`, `int`, `None` (or `null`, etc.).
 68        May be negated by `_`.
 69
 70    params: Optional[Dict[str, Any]], default None
 71        Dictionary of additional parameters to search by.
 72        Params are parsed into a SQL WHERE clause.
 73        E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'`
 74
 75    mrsm_instance: Union[str, InstanceConnector, None], default None
 76        Connector keys for the Meerschaum instance of the pipes.
 77        Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or
 78        `meerschaum.connectors.api.APIConnector.APIConnector`.
 79        
 80    as_list: bool, default False
 81        If `True`, return pipes in a list instead of a hierarchical dictionary.
 82        `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}`
 83        `True`  : `[Pipe]`
 84
 85    as_tags_dict: bool, default False
 86        If `True`, return a dictionary mapping tags to pipes.
 87        Pipes with multiple tags will be repeated.
 88
 89    method: str, default 'registered'
 90        Available options: `['registered', 'explicit', 'all']`
 91        If `'registered'` (default), create pipes based on registered keys in the connector's pipes table
 92        (API or SQL connector, depends on mrsm_instance).
 93        If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys
 94        instead of consulting the pipes table. Useful for creating non-existent pipes.
 95        If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`.
 96        **NOTE:** Method `'all'` is not implemented!
 97
 98    workers: Optional[int], default None
 99        If provided (and `as_tags_dict` is `True`), set the number of workers for the pool
100        to fetch tags.
101        Only takes effect if the instance connector supports multi-threading
102
103    **kw: Any:
104        Keyword arguments to pass to the `meerschaum.Pipe` constructor.
105
106    Returns
107    -------
108    A dictionary of dictionaries and `meerschaum.Pipe` objects
109    in the connector, metric, location hierarchy.
110    If `as_list` is `True`, return a list of `meerschaum.Pipe` objects.
111    If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes.
112
113    Examples
114    --------
115    ```
116    >>> ### Manual definition:
117    >>> pipes = {
118    ...     <connector_keys>: {
119    ...         <metric_key>: {
120    ...             <location_key>: Pipe(
121    ...                 <connector_keys>,
122    ...                 <metric_key>,
123    ...                 <location_key>,
124    ...             ),
125    ...         },
126    ...     },
127    ... },
128    >>> ### Accessing a single pipe:
129    >>> pipes['sql:main']['weather'][None]
130    >>> ### Return a list instead:
131    >>> get_pipes(as_list=True)
132    [Pipe('sql:main', 'weather')]
133    >>> get_pipes(as_tags_dict=True)
134    {'gvl': Pipe('sql:main', 'weather')}
135    ```
136    """
137    import json
138    from collections import defaultdict
139    from meerschaum.config import get_config
140    from meerschaum.config.static import STATIC_CONFIG
141    from meerschaum.utils.warnings import error
142    from meerschaum.utils.misc import filter_keywords, separate_negation_values
143    from meerschaum.utils.pool import get_pool
144    from meerschaum.utils.pipes import replace_pipes_syntax
145    from meerschaum.utils.debug import dprint
146    from meerschaum.utils.dtypes import value_is_null, get_current_timestamp
147    from meerschaum import Pipe
148
149    negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
150    if datetime_dtypes:
151        if isinstance(datetime_dtypes, str):
152            datetime_dtypes = [datetime_dtypes]
153        for _dt in datetime_dtypes:
154            _clean = str(_dt).lstrip(negation_prefix).lower()
155            if _clean not in ('datetime', 'int') and not value_is_null(_clean):
156                error(f"Invalid datetime dtype '{_dt}'.")
157
158    if connector_keys is None:
159        connector_keys = []
160    if metric_keys is None:
161        metric_keys = []
162    if location_keys is None:
163        location_keys = []
164    if params is None:
165        params = {}
166    if tags is None:
167        tags = []
168
169    if isinstance(connector_keys, str):
170        connector_keys = [connector_keys]
171    if isinstance(metric_keys, str):
172        metric_keys = [metric_keys]
173    if isinstance(location_keys, str):
174        location_keys = [location_keys]
175
176    ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`).
177    if mrsm_instance is None:
178        mrsm_instance = instance
179    if mrsm_instance is None:
180        mrsm_instance = get_config('meerschaum', 'instance', patch=True)
181    if isinstance(mrsm_instance, str):
182        from meerschaum.connectors.parse import parse_instance_keys
183        connector = parse_instance_keys(keys=mrsm_instance, debug=debug)
184    else:
185        from meerschaum.connectors import instance_types
186        valid_connector = False
187        if hasattr(mrsm_instance, 'type'):
188            if mrsm_instance.type in instance_types:
189                valid_connector = True
190        if not valid_connector:
191            error(f"Invalid instance connector: {mrsm_instance}")
192        connector = mrsm_instance
193    if debug:
194        dprint(f"Using instance connector: {connector}")
195    if not connector:
196        error(f"Could not create connector from keys: '{mrsm_instance}'")
197
198    ### Get a list of tuples for the keys needed to build pipes.
199    result = fetch_pipes_keys(
200        method,
201        connector,
202        connector_keys = connector_keys,
203        metric_keys = metric_keys,
204        location_keys = location_keys,
205        tags = tags,
206        params = params,
207        workers = workers,
208        debug = debug
209    )
210    if result is None:
211        error("Unable to build pipes!")
212    result_items: List[Tuple] = (
213        list(result.items())
214        if isinstance(result, dict)
215        else [(None, keys_tuple) for keys_tuple in result]
216    )
217
218    ### Populate the `pipes` dictionary with Pipes based on the keys
219    ### obtained from the chosen `method`.
220    in_dtypes, ex_dtypes = separate_negation_values(datetime_dtypes or [])
221    pipes: PipesDict = {}
222    for pipe_id, keys_tuple in result_items:
223        ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2]
224        pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None
225        pipe_parameters = (
226            pipe_tags_or_parameters
227            if isinstance(pipe_tags_or_parameters, (dict, str))
228            else None
229        )
230        if isinstance(pipe_parameters, str):
231            pipe_parameters = json.loads(pipe_parameters)
232        pipe_tags = (
233            pipe_tags_or_parameters
234            if isinstance(pipe_tags_or_parameters, list)
235            else (
236                pipe_tags_or_parameters.get('tags', [])
237                if isinstance(pipe_tags_or_parameters, dict)
238                else None
239            )
240        )
241
242        pipe = Pipe(
243            ck, mk, lk,
244            mrsm_instance = connector,
245            parameters = pipe_parameters,
246            tags = pipe_tags,
247            debug = debug,
248            **filter_keywords(Pipe, **kw)
249        )
250        pipe.__dict__['_tags'] = pipe_tags
251        if pipe_id is not None:
252            pipe._cache_value('_id', pipe_id, memory_only=True, debug=debug)
253        if pipe_parameters is not None:
254            now = get_current_timestamp('ms', as_int=True) / 1000
255            full_attributes = {
256                'connector_keys': ck,
257                'metric_key': mk,
258                'location_key': lk,
259                'parameters': pipe_parameters,
260            }
261            if pipe_id is not None:
262                full_attributes['pipe_id'] = pipe_id
263            pipe._cache_value('attributes', full_attributes, memory_only=True, debug=debug)
264            pipe._cache_value('_attributes_sync_time', now, memory_only=True, debug=debug)
265        if datetime_dtypes:
266            if pipe_parameters is None:
267                pipe_parameters = pipe.get_parameters(debug=debug)
268            columns_val = (pipe_parameters or {}).get('columns', {}) or {}
269            if isinstance(columns_val, str) and 'Pipe(' in columns_val:
270                columns_val = replace_pipes_syntax(columns_val)
271
272            dt_col = columns_val.get('datetime', None)
273            dt_typ = (
274                ((pipe_parameters or {}).get('dtypes', None) or {}).get(dt_col, None)
275                if dt_col
276                else None
277            )
278
279            def _dtype_matches(clean_d):
280                if not dt_col:
281                    return value_is_null(clean_d)
282                return (
283                    (clean_d == 'int' and 'int' in str(dt_typ).lower())
284                    or
285                    (clean_d == 'datetime' and 'int' not in str(dt_typ).lower())
286                )
287
288            in_match = not in_dtypes or any(_dtype_matches(d) for d in in_dtypes)
289            ex_match = bool(ex_dtypes and any(_dtype_matches(d) for d in ex_dtypes))
290            keep_pipe = in_match and not ex_match
291
292            if not keep_pipe:
293                continue
294
295        if ck not in pipes:
296            pipes[ck] = {}
297
298        if mk not in pipes[ck]:
299            pipes[ck][mk] = {}
300
301
302        pipes[ck][mk][lk] = pipe
303
304    if not as_list and not as_tags_dict:
305        return pipes
306
307    from meerschaum.utils.pipes import flatten_pipes_dict
308    pipes_list = flatten_pipes_dict(pipes)
309    if as_list:
310        return pipes_list
311
312    pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1))
313    def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]:
314        _tags = pipe.__dict__.get('_tags', None)
315        gathered_tags = _tags if _tags is not None else pipe.tags
316        return pipe, (gathered_tags or [])
317
318    tags_pipes = defaultdict(lambda: [])
319    pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list))
320    for pipe, tags in pipes_tags.items():
321        for tag in (tags or []):
322            tags_pipes[tag].append(pipe)
323
324    return dict(tags_pipes)

Return a dictionary or list of meerschaum.Pipe objects.

Parameters
  • connector_keys (Union[str, List[str], None], default None): String or list of connector keys. If omitted or is '*', fetch all possible keys. If a string begins with '_', select keys that do NOT match the string.
  • metric_keys (Union[str, List[str], None], default None): String or list of metric keys. See connector_keys for formatting.
  • location_keys (Union[str, List[str], None], default None): String or list of location keys. See connector_keys for formatting.
  • tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
  • datetime_dtypes (Optional[List[str]], default None): If provided, only include pipes with the corresponding datetime axis dtypes. Accepted values are datetime, int, None (or null, etc.). May be negated by _.
  • params (Optional[Dict[str, Any]], default None): Dictionary of additional parameters to search by. Params are parsed into a SQL WHERE clause. E.g. {'a': 1, 'b': 2} equates to 'WHERE a = 1 AND b = 2'
  • mrsm_instance (Union[str, InstanceConnector, None], default None): Connector keys for the Meerschaum instance of the pipes. Must be a meerschaum.connectors.sql.SQLConnector.SQLConnector or meerschaum.connectors.api.APIConnector.APIConnector.
  • as_list (bool, default False): If True, return pipes in a list instead of a hierarchical dictionary. False : {connector_keys: {metric_key: {location_key: Pipe}}} True : [Pipe]
  • as_tags_dict (bool, default False): If True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated.
  • method (str, default 'registered'): Available options: ['registered', 'explicit', 'all'] If 'registered' (default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If 'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If 'all', create pipes from predefined metrics and locations. Required connector_keys. NOTE: Method 'all' is not implemented!
  • workers (Optional[int], default None): If provided (and as_tags_dict is True), set the number of workers for the pool to fetch tags. Only takes effect if the instance connector supports multi-threading
  • **kw (Any:): Keyword arguments to pass to the meerschaum.Pipe constructor.
Returns
  • A dictionary of dictionaries and meerschaum.Pipe objects
  • in the connector, metric, location hierarchy.
  • If as_list is True, return a list of meerschaum.Pipe objects.
  • If as_tags_dict is True, return a dictionary mapping tags to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
...     <connector_keys>: {
...         <metric_key>: {
...             <location_key>: Pipe(
...                 <connector_keys>,
...                 <metric_key>,
...                 <location_key>,
...             ),
...         },
...     },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
def get_connector( type: str = None, label: str = None, refresh: bool = False, debug: bool = False, _load_plugins: bool = True, **kw: Any) -> Connector:
 68def get_connector(
 69    type: str = None,
 70    label: str = None,
 71    refresh: bool = False,
 72    debug: bool = False,
 73    _load_plugins: bool = True,
 74    **kw: Any
 75) -> Connector:
 76    """
 77    Return existing connector or create new connection and store for reuse.
 78    
 79    You can create new connectors if enough parameters are provided for the given type and flavor.
 80
 81    Parameters
 82    ----------
 83    type: Optional[str], default None
 84        Connector type (sql, api, etc.).
 85        Defaults to the type of the configured `instance_connector`.
 86
 87    label: Optional[str], default None
 88        Connector label (e.g. main). Defaults to `'main'`.
 89
 90    refresh: bool, default False
 91        Refresh the Connector instance / construct new object. Defaults to `False`.
 92
 93    kw: Any
 94        Other arguments to pass to the Connector constructor.
 95        If the Connector has already been constructed and new arguments are provided,
 96        `refresh` is set to `True` and the old Connector is replaced.
 97
 98    Returns
 99    -------
100    A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`,
101    `meerschaum.connectors.sql.SQLConnector`).
102    
103    Examples
104    --------
105    The following parameters would create a new
106    `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file.
107
108    ```
109    >>> conn = get_connector(
110    ...     type = 'sql',
111    ...     label = 'newlabel',
112    ...     flavor = 'sqlite',
113    ...     database = '/file/path/to/database.db'
114    ... )
115    >>>
116    ```
117
118    """
119    from meerschaum.connectors.parse import parse_instance_keys
120    from meerschaum.config import get_config
121    from meerschaum._internal.static import STATIC_CONFIG
122    from meerschaum.utils.warnings import warn
123    global _loaded_plugin_connectors
124    if isinstance(type, str) and not label and ':' in type:
125        type, label = type.split(':', maxsplit=1)
126
127    if _load_plugins:
128        with _locks['_loaded_plugin_connectors']:
129            if not _loaded_plugin_connectors:
130                load_plugin_connectors()
131                _load_builtin_custom_connectors()
132                _loaded_plugin_connectors = True
133
134    if type is None and label is None:
135        default_instance_keys = get_config('meerschaum', 'instance', patch=True)
136        ### recursive call to get_connector
137        return parse_instance_keys(default_instance_keys)
138
139    ### NOTE: the default instance connector may not be main.
140    ### Only fall back to 'main' if the type is provided by the label is omitted.
141    label = label if label is not None else STATIC_CONFIG['connectors']['default_label']
142
143    ### type might actually be a label. Check if so and raise a warning.
144    if type not in connectors:
145        possibilities, poss_msg = [], ""
146        for _type in get_config('meerschaum', 'connectors'):
147            if type in get_config('meerschaum', 'connectors', _type):
148                possibilities.append(f"{_type}:{type}")
149        if len(possibilities) > 0:
150            poss_msg = " Did you mean"
151            for poss in possibilities[:-1]:
152                poss_msg += f" '{poss}',"
153            if poss_msg.endswith(','):
154                poss_msg = poss_msg[:-1]
155            if len(possibilities) > 1:
156                poss_msg += " or"
157            poss_msg += f" '{possibilities[-1]}'?"
158
159        warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False)
160        return None
161
162    if 'sql' not in types:
163        from meerschaum.connectors.plugin import PluginConnector
164        from meerschaum.connectors.valkey import ValkeyConnector
165        with _locks['types']:
166            types.update({
167                'api': APIConnector,
168                'sql': SQLConnector,
169                'plugin': PluginConnector,
170                'valkey': ValkeyConnector,
171            })
172
173    ### determine if we need to call the constructor
174    if not refresh:
175        ### see if any user-supplied arguments differ from the existing instance
176        if label in connectors[type]:
177            warning_message = None
178            for attribute, value in kw.items():
179                if attribute not in connectors[type][label].meta:
180                    import inspect
181                    cls = connectors[type][label].__class__
182                    cls_init_signature = inspect.signature(cls)
183                    cls_init_params = cls_init_signature.parameters
184                    if attribute not in cls_init_params:
185                        warning_message = (
186                            f"Received new attribute '{attribute}' not present in connector " +
187                            f"{connectors[type][label]}.\n"
188                        )
189                elif connectors[type][label].__dict__[attribute] != value:
190                    warning_message = (
191                        f"Mismatched values for attribute '{attribute}' in connector "
192                        + f"'{connectors[type][label]}'.\n" +
193                        f"  - Keyword value: '{value}'\n" +
194                        f"  - Existing value: '{connectors[type][label].__dict__[attribute]}'\n"
195                    )
196            if warning_message is not None:
197                warning_message += (
198                    "\nSetting `refresh` to True and recreating connector with type:"
199                    + f" '{type}' and label '{label}'."
200                )
201                refresh = True
202                warn(warning_message)
203        else: ### connector doesn't yet exist
204            refresh = True
205
206    ### only create an object if refresh is True
207    ### (can be manually specified, otherwise determined above)
208    if refresh:
209        with _locks['connectors']:
210            try:
211                ### will raise an error if configuration is incorrect / missing
212                conn = types[type](label=label, **kw)
213                connectors[type][label] = conn
214            except InvalidAttributesError as ie:
215                warn(
216                    f"Incorrect attributes for connector '{type}:{label}'.\n"
217                    + str(ie),
218                    stack = False,
219                )
220                conn = None
221            except Exception as e:
222                from meerschaum.utils.formatting import get_console
223                console = get_console()
224                if console:
225                    console.print_exception()
226                warn(
227                    f"Exception when creating connector '{type}:{label}'.\n" + str(e),
228                    stack = False,
229                )
230                conn = None
231        if conn is None:
232            return None
233
234    return connectors[type][label]

Return existing connector or create new connection and store for reuse.

You can create new connectors if enough parameters are provided for the given type and flavor.

Parameters
  • type (Optional[str], default None): Connector type (sql, api, etc.). Defaults to the type of the configured instance_connector.
  • label (Optional[str], default None): Connector label (e.g. main). Defaults to 'main'.
  • refresh (bool, default False): Refresh the Connector instance / construct new object. Defaults to False.
  • kw (Any): Other arguments to pass to the Connector constructor. If the Connector has already been constructed and new arguments are provided, refresh is set to True and the old Connector is replaced.
Returns
Examples

The following parameters would create a new meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.

>>> conn = get_connector(
...     type = 'sql',
...     label = 'newlabel',
...     flavor = 'sqlite',
...     database = '/file/path/to/database.db'
... )
>>>
def get_config( *keys: str, patch: bool = True, substitute: bool = True, sync_files: bool = True, write_missing: bool = True, as_tuple: bool = False, warn: bool = True, debug: bool = False) -> Any:
112def get_config(
113    *keys: str,
114    patch: bool = True,
115    substitute: bool = True,
116    sync_files: bool = True,
117    write_missing: bool = True,
118    as_tuple: bool = False,
119    warn: bool = True,
120    debug: bool = False
121) -> Any:
122    """
123    Return the Meerschaum configuration dictionary.
124    If positional arguments are provided, index by the keys.
125    Raises a warning if invalid keys are provided.
126
127    Parameters
128    ----------
129    keys: str:
130        List of strings to index.
131
132    patch: bool, default True
133        If `True`, patch missing default keys into the config directory.
134        Defaults to `True`.
135
136    sync_files: bool, default True
137        If `True`, sync files if needed.
138        Defaults to `True`.
139
140    write_missing: bool, default True
141        If `True`, write default values when the main config files are missing.
142        Defaults to `True`.
143
144    substitute: bool, default True
145        If `True`, subsitute 'MRSM{}' values.
146        Defaults to `True`.
147
148    as_tuple: bool, default False
149        If `True`, return a tuple of type (success, value).
150        Defaults to `False`.
151        
152    Returns
153    -------
154    The value in the configuration directory, indexed by the provided keys.
155
156    Examples
157    --------
158    >>> get_config('meerschaum', 'instance')
159    'sql:main'
160    >>> get_config('does', 'not', 'exist')
161    UserWarning: Invalid keys in config: ('does', 'not', 'exist')
162    """
163    import json
164
165    symlinks_key = STATIC_CONFIG['config']['symlinks_key']
166    if debug:
167        from meerschaum.utils.debug import dprint
168        dprint(f"Indexing keys: {keys}", color=False)
169
170    if len(keys) == 0:
171        _rc = _config(
172            substitute=substitute,
173            sync_files=sync_files,
174            write_missing=(write_missing and _allow_write_missing),
175        )
176        if as_tuple:
177            return True, _rc 
178        return _rc
179    
180    ### Weird threading issues, only import if substitute is True.
181    if substitute:
182        from meerschaum.config._read_config import search_and_substitute_config
183    ### Invalidate the cache if it was read before with substitute=False
184    ### but there still exist substitutions.
185    if (
186        config is not None and substitute and keys[0] != symlinks_key
187        and 'MRSM{' in json.dumps(config.get(keys[0]))
188    ):
189        try:
190            _subbed = search_and_substitute_config({keys[0]: config[keys[0]]})
191        except Exception:
192            import traceback
193            traceback.print_exc()
194            _subbed = {keys[0]: config[keys[0]]}
195
196        config[keys[0]] = _subbed[keys[0]]
197        if symlinks_key in _subbed:
198            if symlinks_key not in config:
199                config[symlinks_key] = {}
200            config[symlinks_key] = apply_patch_to_config(
201                _subbed.get(symlinks_key, {}),
202                config.get(symlinks_key, {}),
203            )
204
205    from meerschaum.config._sync import sync_files as _sync_files
206    if config is None:
207        _config(*keys, sync_files=sync_files)
208
209    invalid_keys = False
210    if keys[0] not in config and keys[0] != symlinks_key:
211        single_key_config = read_config(
212            keys=[keys[0]], substitute=substitute, write_missing=write_missing
213        )
214        if keys[0] not in single_key_config:
215            invalid_keys = True
216        else:
217            config[keys[0]] = single_key_config.get(keys[0], None)
218            if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]:
219                if symlinks_key not in config:
220                    config[symlinks_key] = {}
221                config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]]
222
223            if sync_files:
224                _sync_files(keys=[keys[0]])
225
226    c = config
227    if len(keys) > 0:
228        for k in keys:
229            try:
230                c = c[k]
231            except Exception:
232                invalid_keys = True
233                break
234        if invalid_keys:
235            ### Check if the keys are in the default configuration.
236            from meerschaum.config._default import default_config
237            in_default = True
238            patched_default_config = (
239                search_and_substitute_config(default_config)
240                if substitute else copy.deepcopy(default_config)
241            )
242            _c = patched_default_config
243            for k in keys:
244                try:
245                    _c = _c[k]
246                except Exception:
247                    in_default = False
248            if in_default:
249                c = _c
250                invalid_keys = False
251            warning_msg = f"Invalid keys in config: {keys}"
252            if not in_default:
253                try:
254                    if warn:
255                        from meerschaum.utils.warnings import warn as _warn
256                        _warn(warning_msg, stacklevel=3, color=False)
257                except Exception:
258                    if warn:
259                        print(warning_msg)
260                if as_tuple:
261                    return False, None
262                return None
263
264            ### Don't write keys that we haven't yet loaded into memory.
265            not_loaded_keys = [k for k in patched_default_config if k not in config]
266            for k in not_loaded_keys:
267                patched_default_config.pop(k, None)
268
269            set_config(
270                apply_patch_to_config(
271                    patched_default_config,
272                    config,
273                )
274            )
275            if patch and keys[0] != symlinks_key:
276                if write_missing:
277                    write_config(config, debug=debug)
278
279    if as_tuple:
280        return (not invalid_keys), c
281    return c

Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.

Parameters
  • keys (str:): List of strings to index.
  • patch (bool, default True): If True, patch missing default keys into the config directory. Defaults to True.
  • sync_files (bool, default True): If True, sync files if needed. Defaults to True.
  • write_missing (bool, default True): If True, write default values when the main config files are missing. Defaults to True.
  • substitute (bool, default True): If True, subsitute 'MRSM{}' values. Defaults to True.
  • as_tuple (bool, default False): If True, return a tuple of type (success, value). Defaults to False.
Returns
  • The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
class Pipe:
 66class Pipe:
 67    """
 68    Access Meerschaum pipes via Pipe objects.
 69    
 70    Pipes are identified by the following:
 71
 72    1. Connector keys (e.g. `'sql:main'`)
 73    2. Metric key (e.g. `'weather'`)
 74    3. Location (optional; e.g. `None`)
 75    
 76    A pipe's connector keys correspond to a data source, and when the pipe is synced,
 77    its `fetch` definition is evaluated and executed to produce new data.
 78    
 79    Alternatively, new data may be directly synced via `pipe.sync()`:
 80    
 81    ```
 82    >>> from meerschaum import Pipe
 83    >>> pipe = Pipe('csv', 'weather')
 84    >>>
 85    >>> import pandas as pd
 86    >>> df = pd.read_csv('weather.csv')
 87    >>> pipe.sync(df)
 88    ```
 89    """
 90
 91    from ._fetch import (
 92        fetch,
 93        get_backtrack_interval,
 94    )
 95    from ._data import (
 96        get_data,
 97        get_backtrack_data,
 98        get_rowcount,
 99        get_data,
100        get_doc,
101        get_docs,
102        get_value,
103        _get_data_as_iterator,
104        get_chunk_interval,
105        get_chunk_bounds,
106        get_chunk_bounds_batches,
107        parse_date_bounds,
108    )
109    from ._register import register
110    from ._attributes import (
111        attributes,
112        parameters,
113        columns,
114        indices,
115        indexes,
116        dtypes,
117        autoincrement,
118        autotime,
119        upsert,
120        static,
121        tzinfo,
122        enforce,
123        null_indices,
124        mixed_numerics,
125        get_columns,
126        get_columns_types,
127        get_columns_indices,
128        get_indices,
129        get_parameters,
130        get_dtypes,
131        update_parameters,
132        tags,
133        get_id,
134        id,
135        get_val_column,
136        parents,
137        parent,
138        children,
139        child,
140        reference,
141        references,
142        target,
143        _target_legacy,
144        guess_datetime,
145        precision,
146        get_precision,
147    )
148    from ._cache import (
149        _get_cache_connector,
150        _cache_value,
151        _get_cached_value,
152        _invalidate_cache,
153        _get_cache_dir_path,
154        _write_cache_key,
155        _write_cache_file,
156        _write_cache_conn_key,
157        _read_cache_key,
158        _read_cache_file,
159        _read_cache_conn_key,
160        _load_cache_keys,
161        _load_cache_files,
162        _load_cache_conn_keys,
163        _get_cache_keys,
164        _get_cache_file_keys,
165        _get_cache_conn_keys,
166        _clear_cache_key,
167        _clear_cache_file,
168        _clear_cache_conn_key,
169    )
170    from ._show import show
171    from ._edit import edit, edit_definition, update
172    from ._sync import (
173        sync,
174        get_sync_time,
175        exists,
176        filter_existing,
177        _get_chunk_label,
178        get_num_workers,
179        _persist_new_special_columns,
180    )
181    from ._verify import (
182        verify,
183        get_bound_interval,
184        get_bound_time,
185    )
186    from ._delete import delete
187    from ._drop import drop, drop_indices
188    from ._index import create_indices
189    from ._clear import clear
190    from ._deduplicate import deduplicate
191    from ._bootstrap import bootstrap
192    from ._dtypes import enforce_dtypes, infer_dtypes
193    from ._copy import copy_to
194
195    def __init__(
196        self,
197        connector: str = '',
198        metric: str = '',
199        location: Optional[str] = None,
200        parameters: Optional[Dict[str, Any]] = None,
201        columns: Union[Dict[str, str], List[str], None] = None,
202        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
203        tags: Optional[List[str]] = None,
204        target: Optional[str] = None,
205        dtypes: Optional[Dict[str, str]] = None,
206        instance: Optional[Union[str, InstanceConnector]] = None,
207        upsert: Optional[bool] = None,
208        autoincrement: Optional[bool] = None,
209        autotime: Optional[bool] = None,
210        precision: Union[str, Dict[str, Union[str, int]], None] = None,
211        static: Optional[bool] = None,
212        enforce: Optional[bool] = None,
213        null_indices: Optional[bool] = None,
214        mixed_numerics: Optional[bool] = None,
215        temporary: bool = False,
216        cache: Optional[bool] = None,
217        cache_connector_keys: Optional[str] = None,
218        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
219        reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
220        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
221        parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
222        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
223        child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
224        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
225        connector_keys: Optional[str] = None,
226        metric_key: Optional[str] = None,
227        location_key: Optional[str] = None,
228        instance_keys: Optional[str] = None,
229        indexes: Union[Dict[str, str], List[str], None] = None,
230        debug: bool = False,
231    ):
232        """
233        Parameters
234        ----------
235        connector: str
236            Keys for the pipe's source connector, e.g. `'sql:main'`.
237
238        metric: str
239            Label for the pipe's contents, e.g. `'weather'`.
240
241        location: str, default None
242            Label for the pipe's location. Defaults to `None`.
243
244        parameters: Optional[Dict[str, Any]], default None
245            Optionally set a pipe's parameters from the constructor,
246            e.g. columns and other attributes.
247            You can edit these parameters with `edit pipes`.
248
249        columns: Union[Dict[str, str], List[str], None], default None
250            Set the `columns` dictionary of `parameters`.
251            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
252
253        indices: Optional[Dict[str, Union[str, List[str]]]], default None
254            Set the `indices` dictionary of `parameters`.
255            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
256
257        tags: Optional[List[str]], default None
258            A list of strings to be added under the `'tags'` key of `parameters`.
259            You can select pipes with certain tags using `--tags`.
260
261        dtypes: Optional[Dict[str, str]], default None
262            Set the `dtypes` dictionary of `parameters`.
263            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
264
265        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
266            Connector for the Meerschaum instance where the pipe resides.
267            Defaults to the preconfigured default instance (`'sql:main'`).
268
269        instance: Optional[Union[str, InstanceConnector]], default None
270            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
271
272        upsert: Optional[bool], default None
273            If `True`, set `upsert` to `True` in the parameters.
274
275        autoincrement: Optional[bool], default None
276            If `True`, set `autoincrement` in the parameters.
277
278        autotime: Optional[bool], default None
279            If `True`, set `autotime` in the parameters.
280
281        precision: Union[str, Dict[str, Union[str, int]], None], default None
282            If provided, set `precision` in the parameters.
283            This may be either a string (the precision unit) or a dictionary of in the form
284            `{'unit': <unit>, 'interval': <interval>}`.
285            Default is determined by the `datetime` column dtype
286            (e.g. `datetime64[us]` is `microsecond` precision).
287
288        static: Optional[bool], default None
289            If `True`, set `static` in the parameters.
290
291        enforce: Optional[bool], default None
292            If `False`, skip data type enforcement.
293            Default behavior is `True`.
294
295        null_indices: Optional[bool], default None
296            Set to `False` if there will be no null values in the index columns.
297            Defaults to `True`.
298
299        mixed_numerics: bool, default None
300            If `True`, integer columns will be converted to `numeric` when floats are synced.
301            Set to `False` to disable this behavior.
302            Defaults to `True`.
303
304        temporary: bool, default False
305            If `True`, prevent instance tables (pipes, users, plugins) from being created.
306
307        cache: Optional[bool], default None
308            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
309            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
310            Defaults to `True` (from `None`).
311
312        cache_connector_keys: Optional[str], default None
313            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
314
315        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
316            If provided, inherit the parameters of the reference Pipe(s).
317            May be equal to a string of the Pipe constructor, a dictionary of constructor keys,
318            a Pipe itself, or a list of any of these values.
319
320        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
321            Set references for parent pipes. See `references` for values.
322
323        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
324            Set references for child pipes. See `references` for values.
325
326        """
327        from meerschaum.utils.warnings import error, warn
328        if (not connector and not connector_keys) or (not metric and not metric_key):
329            error(
330                "Please provide strings for the connector and metric\n    "
331                + "(first two positional arguments)."
332            )
333
334        ### Fall back to legacy `location_key` just in case.
335        if not location:
336            location = location_key
337
338        if not connector:
339            connector = connector_keys
340
341        if not metric:
342            metric = metric_key
343
344        if location in ('[None]', 'None'):
345            location = None
346
347        from meerschaum._internal.static import STATIC_CONFIG
348        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
349        for k in (connector, metric, location, *(tags or [])):
350            if str(k).startswith(negation_prefix):
351                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
352
353        self._connector_keys = str(connector)
354        self._connector_key = self.connector_keys ### Alias
355        self._metric_key = metric
356        self._location_key = location
357        self.temporary = temporary
358        self.cache = (
359            cache
360            if cache is not None
361            else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False))
362        )
363        self.cache_connector_keys = (
364            str(cache_connector_keys)
365            if cache_connector_keys is not None
366            else None
367        )
368        self.debug = debug
369
370        self._attributes: Dict[str, Any] = {
371            'connector_keys': self._connector_keys,
372            'metric_key': self._metric_key,
373            'location_key': self._location_key,
374            'parameters': {},
375        }
376
377        ### only set parameters if values are provided
378        if isinstance(parameters, dict):
379            self._attributes['parameters'] = parameters
380        else:
381            if parameters is not None:
382                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
383            self._attributes['parameters'] = {}
384
385        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
386        if isinstance(columns, (list, tuple)):
387            columns = {str(col): str(col) for col in columns}
388        if isinstance(columns, dict):
389            self._attributes['parameters']['columns'] = columns
390        elif isinstance(columns, str) and 'Pipe(' in columns:
391            pass
392        elif columns is not None:
393            warn(f"The provided columns are of invalid type '{type(columns)}'.")
394
395        indices = (
396            indices
397            or indexes
398            or self._attributes.get('parameters', {}).get('indices', None)
399            or self._attributes.get('parameters', {}).get('indexes', None)
400        )
401        if isinstance(indices, dict):
402            indices_key = (
403                'indexes'
404                if 'indexes' in self._attributes['parameters']
405                else 'indices'
406            )
407            self._attributes['parameters'][indices_key] = indices
408
409        if isinstance(tags, (list, tuple)):
410            self._attributes['parameters']['tags'] = tags
411        elif tags is not None:
412            warn(f"The provided tags are of invalid type '{type(tags)}'.")
413
414        if isinstance(target, str):
415            self._attributes['parameters']['target'] = target
416        elif target is not None:
417            warn(f"The provided target is of invalid type '{type(target)}'.")
418
419        if isinstance(dtypes, dict):
420            self._attributes['parameters']['dtypes'] = dtypes
421        elif dtypes is not None:
422            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
423
424        if isinstance(upsert, bool):
425            self._attributes['parameters']['upsert'] = upsert
426
427        if isinstance(autoincrement, bool):
428            self._attributes['parameters']['autoincrement'] = autoincrement
429
430        if isinstance(autotime, bool):
431            self._attributes['parameters']['autotime'] = autotime
432
433        if isinstance(precision, dict):
434            self._attributes['parameters']['precision'] = precision
435        elif isinstance(precision, str):
436            self._attributes['parameters']['precision'] = {'unit': precision}
437
438        if isinstance(static, bool):
439            self._attributes['parameters']['static'] = static
440            self._static = static
441
442        if isinstance(enforce, bool):
443            self._attributes['parameters']['enforce'] = enforce
444
445        if isinstance(null_indices, bool):
446            self._attributes['parameters']['null_indices'] = null_indices
447
448        if isinstance(mixed_numerics, bool):
449            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
450
451        ### NOTE: The parameters dictionary is {} by default.
452        ###       A Pipe may be registered without parameters, then edited,
453        ###       or a Pipe may be registered with parameters set in-memory first.
454        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
455        if _mrsm_instance is None:
456            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
457
458        if not isinstance(_mrsm_instance, str):
459            self._instance_connector = _mrsm_instance
460            self._instance_keys = str(_mrsm_instance)
461        else:
462            self._instance_keys = _mrsm_instance
463
464        if self._instance_keys == 'sql:memory':
465            self.cache = False
466
467        self._cache_locks = collections.defaultdict(lambda: threading.RLock())
468
469        if references is not None or reference is not None:
470            reference_vals = references if references is not None else reference
471            self.references = reference_vals
472
473        if parents is not None or parent is not None:
474            parent_vals = parents if parents is not None else parent
475            self.parents = parent_vals
476
477        if children is not None or child is not None:
478            children_vals = children if children is not None else child
479            self.children = children_vals
480
481    @property
482    def metric_key(self) -> str:
483        """
484        Return the pipe's metric key.
485        """
486        return self._metric_key
487
488    @property
489    def metric(self) -> str:
490        """
491        Return the pipe's metric key.
492        """
493        return self._metric_key
494
495    @property
496    def location_key(self) -> Union[str, None]:
497        """
498        Return the pipe's location key.
499        """
500        return self._location_key
501
502    @property
503    def location(self) -> Union[str, None]:
504        """
505        Return the pipe's location key.
506        """
507        return self._location_key
508
509    @property
510    def meta(self):
511        """
512        Return the four keys needed to reconstruct this pipe.
513        """
514        return {
515            'connector_keys': self.connector_keys,
516            'metric_key': self.metric_key,
517            'location_key': self.location_key,
518            'instance_keys': self.instance_keys,
519        }
520
521    def keys(self) -> List[str]:
522        """
523        Return the ordered keys for this pipe.
524        """
525        return {
526            key: val
527            for key, val in self.meta.items()
528            if key != 'instance'
529        }
530
531    @property
532    def instance_keys(self) -> str:
533        """
534        Return the pipe's instance keys.
535        """
536        return self._instance_keys
537
538    @property
539    def instance(self) -> Union[InstanceConnector, str]:
540        """
541        Return the pipe's instance connector or keys.
542        """
543        conn = self.instance_connector
544        if conn is None:
545            return self.instance_keys
546        return conn
547
548    @property
549    def instance_connector(self) -> Union[InstanceConnector, None]:
550        """
551        The instance connector on which this pipe resides.
552        """
553        if '_instance_connector' not in self.__dict__:
554            from meerschaum.connectors.parse import parse_instance_keys
555            conn = parse_instance_keys(self.instance_keys)
556            if conn:
557                self._instance_connector = conn
558            else:
559                return None
560        return self._instance_connector
561
562    @property
563    def connector_keys(self) -> str:
564        """
565        Return the pipe's connector keys.
566        """
567        return self._connector_keys
568
569    @property
570    def connector_key(self) -> str:
571        """
572        Legacy: use `Pipe.connector_keys` instead.
573        """
574        return self.connector_keys
575
576    @property
577    def connector(self) -> Union['Connector', str]:
578        """
579        The connector to the data source.
580        """
581        if '_connector' not in self.__dict__:
582            from meerschaum.connectors.parse import parse_instance_keys
583            import warnings
584            with warnings.catch_warnings():
585                warnings.simplefilter('ignore')
586                try:
587                    conn = parse_instance_keys(self.connector_keys)
588                except Exception:
589                    conn = None
590            if conn:
591                self._connector = conn
592            else:
593                return self._connector_keys
594        return self._connector
595
596    def __str__(self, ansi: bool=False):
597        return pipe_repr(self, ansi=ansi)
598
599    def __eq__(self, other):
600        try:
601            return (
602                isinstance(self, type(other))
603                and self.connector_keys == other.connector_keys
604                and self.metric_key == other.metric_key
605                and self.location_key == other.location_key
606                and self.instance_keys == other.instance_keys
607            )
608        except Exception:
609            return False
610
611    def __hash__(self):
612        ### Using an esoteric separator to avoid collisions.
613        sep = "[\"']"
614        return hash(
615            str(self.connector_keys) + sep
616            + str(self.metric_key) + sep
617            + str(self.location_key) + sep
618            + str(self.instance_keys) + sep
619        )
620
621    def __repr__(self, ansi: bool=True, **kw) -> str:
622        if not hasattr(sys, 'ps1'):
623            ansi = False
624
625        return pipe_repr(self, ansi=ansi, **kw)
626
627    def __pt_repr__(self):
628        from meerschaum.utils.packages import attempt_import
629        prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False)
630        return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True))
631
632    def __getstate__(self) -> Dict[str, Any]:
633        """
634        Define the state dictionary (pickling).
635        """
636        return {
637            'connector_keys': self.connector_keys,
638            'metric_key': self.metric_key,
639            'location_key': self.location_key,
640            'parameters': self._attributes.get('parameters', None),
641            'instance_keys': self.instance_keys,
642        }
643
644    def __setstate__(self, _state: Dict[str, Any]):
645        """
646        Read the state (unpickling).
647        """
648        self.__init__(**_state)
649
650    def __getitem__(self, key: str) -> Any:
651        """
652        Index the pipe's attributes.
653        If the `key` cannot be found`, return `None`.
654        """
655        if key in self.attributes:
656            return self.attributes.get(key, None)
657
658        aliases = {
659            'connector': 'connector_keys',
660            'connector_key': 'connector_keys',
661            'metric': 'metric_key',
662            'location': 'location_key',
663        }
664        aliased_key = aliases.get(key, None)
665        if aliased_key is not None:
666            return self.attributes.get(aliased_key, None)
667
668        property_aliases = {
669            'instance': 'instance_keys',
670            'instance_key': 'instance_keys',
671        }
672        aliased_key = property_aliases.get(key, None)
673        if aliased_key is not None:
674            key = aliased_key
675        return getattr(self, key, None)
676
677    def __copy__(self):
678        """
679        Return a shallow copy of the current pipe.
680        """
681        return mrsm.Pipe(
682            self.connector_keys, self.metric_key, self.location_key,
683            instance=self.instance_keys,
684            parameters=self._attributes.get('parameters', None),
685        )
686
687    def __deepcopy__(self, memo):
688        """
689        Return a deep copy of the current pipe.
690        """
691        return self.__copy__()

Access Meerschaum pipes via Pipe objects.

Pipes are identified by the following:

  1. Connector keys (e.g. 'sql:main')
  2. Metric key (e.g. 'weather')
  3. Location (optional; e.g. None)

A pipe's connector keys correspond to a data source, and when the pipe is synced, its fetch definition is evaluated and executed to produce new data.

Alternatively, new data may be directly synced via pipe.sync():

>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
Pipe( connector: str = '', metric: str = '', location: Optional[str] = None, parameters: Optional[Dict[str, Any]] = None, columns: Union[Dict[str, str], List[str], NoneType] = None, indices: Optional[Dict[str, Union[str, List[str]]]] = None, tags: Optional[List[str]] = None, target: Optional[str] = None, dtypes: Optional[Dict[str, str]] = None, instance: Union[str, InstanceConnector, NoneType] = None, upsert: Optional[bool] = None, autoincrement: Optional[bool] = None, autotime: Optional[bool] = None, precision: Union[str, Dict[str, Union[str, int]], NoneType] = None, static: Optional[bool] = None, enforce: Optional[bool] = None, null_indices: Optional[bool] = None, mixed_numerics: Optional[bool] = None, temporary: bool = False, cache: Optional[bool] = None, cache_connector_keys: Optional[str] = None, references: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, reference: Union[str, Dict[str, Any], Pipe, NoneType] = None, parents: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, parent: Union[str, Dict[str, Any], Pipe, NoneType] = None, children: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, child: Union[str, Dict[str, Any], Pipe, NoneType] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, connector_keys: Optional[str] = None, metric_key: Optional[str] = None, location_key: Optional[str] = None, instance_keys: Optional[str] = None, indexes: Union[Dict[str, str], List[str], NoneType] = None, debug: bool = False)
195    def __init__(
196        self,
197        connector: str = '',
198        metric: str = '',
199        location: Optional[str] = None,
200        parameters: Optional[Dict[str, Any]] = None,
201        columns: Union[Dict[str, str], List[str], None] = None,
202        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
203        tags: Optional[List[str]] = None,
204        target: Optional[str] = None,
205        dtypes: Optional[Dict[str, str]] = None,
206        instance: Optional[Union[str, InstanceConnector]] = None,
207        upsert: Optional[bool] = None,
208        autoincrement: Optional[bool] = None,
209        autotime: Optional[bool] = None,
210        precision: Union[str, Dict[str, Union[str, int]], None] = None,
211        static: Optional[bool] = None,
212        enforce: Optional[bool] = None,
213        null_indices: Optional[bool] = None,
214        mixed_numerics: Optional[bool] = None,
215        temporary: bool = False,
216        cache: Optional[bool] = None,
217        cache_connector_keys: Optional[str] = None,
218        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
219        reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
220        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
221        parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
222        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
223        child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
224        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
225        connector_keys: Optional[str] = None,
226        metric_key: Optional[str] = None,
227        location_key: Optional[str] = None,
228        instance_keys: Optional[str] = None,
229        indexes: Union[Dict[str, str], List[str], None] = None,
230        debug: bool = False,
231    ):
232        """
233        Parameters
234        ----------
235        connector: str
236            Keys for the pipe's source connector, e.g. `'sql:main'`.
237
238        metric: str
239            Label for the pipe's contents, e.g. `'weather'`.
240
241        location: str, default None
242            Label for the pipe's location. Defaults to `None`.
243
244        parameters: Optional[Dict[str, Any]], default None
245            Optionally set a pipe's parameters from the constructor,
246            e.g. columns and other attributes.
247            You can edit these parameters with `edit pipes`.
248
249        columns: Union[Dict[str, str], List[str], None], default None
250            Set the `columns` dictionary of `parameters`.
251            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
252
253        indices: Optional[Dict[str, Union[str, List[str]]]], default None
254            Set the `indices` dictionary of `parameters`.
255            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
256
257        tags: Optional[List[str]], default None
258            A list of strings to be added under the `'tags'` key of `parameters`.
259            You can select pipes with certain tags using `--tags`.
260
261        dtypes: Optional[Dict[str, str]], default None
262            Set the `dtypes` dictionary of `parameters`.
263            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
264
265        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
266            Connector for the Meerschaum instance where the pipe resides.
267            Defaults to the preconfigured default instance (`'sql:main'`).
268
269        instance: Optional[Union[str, InstanceConnector]], default None
270            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
271
272        upsert: Optional[bool], default None
273            If `True`, set `upsert` to `True` in the parameters.
274
275        autoincrement: Optional[bool], default None
276            If `True`, set `autoincrement` in the parameters.
277
278        autotime: Optional[bool], default None
279            If `True`, set `autotime` in the parameters.
280
281        precision: Union[str, Dict[str, Union[str, int]], None], default None
282            If provided, set `precision` in the parameters.
283            This may be either a string (the precision unit) or a dictionary of in the form
284            `{'unit': <unit>, 'interval': <interval>}`.
285            Default is determined by the `datetime` column dtype
286            (e.g. `datetime64[us]` is `microsecond` precision).
287
288        static: Optional[bool], default None
289            If `True`, set `static` in the parameters.
290
291        enforce: Optional[bool], default None
292            If `False`, skip data type enforcement.
293            Default behavior is `True`.
294
295        null_indices: Optional[bool], default None
296            Set to `False` if there will be no null values in the index columns.
297            Defaults to `True`.
298
299        mixed_numerics: bool, default None
300            If `True`, integer columns will be converted to `numeric` when floats are synced.
301            Set to `False` to disable this behavior.
302            Defaults to `True`.
303
304        temporary: bool, default False
305            If `True`, prevent instance tables (pipes, users, plugins) from being created.
306
307        cache: Optional[bool], default None
308            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
309            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
310            Defaults to `True` (from `None`).
311
312        cache_connector_keys: Optional[str], default None
313            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
314
315        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
316            If provided, inherit the parameters of the reference Pipe(s).
317            May be equal to a string of the Pipe constructor, a dictionary of constructor keys,
318            a Pipe itself, or a list of any of these values.
319
320        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
321            Set references for parent pipes. See `references` for values.
322
323        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
324            Set references for child pipes. See `references` for values.
325
326        """
327        from meerschaum.utils.warnings import error, warn
328        if (not connector and not connector_keys) or (not metric and not metric_key):
329            error(
330                "Please provide strings for the connector and metric\n    "
331                + "(first two positional arguments)."
332            )
333
334        ### Fall back to legacy `location_key` just in case.
335        if not location:
336            location = location_key
337
338        if not connector:
339            connector = connector_keys
340
341        if not metric:
342            metric = metric_key
343
344        if location in ('[None]', 'None'):
345            location = None
346
347        from meerschaum._internal.static import STATIC_CONFIG
348        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
349        for k in (connector, metric, location, *(tags or [])):
350            if str(k).startswith(negation_prefix):
351                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
352
353        self._connector_keys = str(connector)
354        self._connector_key = self.connector_keys ### Alias
355        self._metric_key = metric
356        self._location_key = location
357        self.temporary = temporary
358        self.cache = (
359            cache
360            if cache is not None
361            else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False))
362        )
363        self.cache_connector_keys = (
364            str(cache_connector_keys)
365            if cache_connector_keys is not None
366            else None
367        )
368        self.debug = debug
369
370        self._attributes: Dict[str, Any] = {
371            'connector_keys': self._connector_keys,
372            'metric_key': self._metric_key,
373            'location_key': self._location_key,
374            'parameters': {},
375        }
376
377        ### only set parameters if values are provided
378        if isinstance(parameters, dict):
379            self._attributes['parameters'] = parameters
380        else:
381            if parameters is not None:
382                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
383            self._attributes['parameters'] = {}
384
385        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
386        if isinstance(columns, (list, tuple)):
387            columns = {str(col): str(col) for col in columns}
388        if isinstance(columns, dict):
389            self._attributes['parameters']['columns'] = columns
390        elif isinstance(columns, str) and 'Pipe(' in columns:
391            pass
392        elif columns is not None:
393            warn(f"The provided columns are of invalid type '{type(columns)}'.")
394
395        indices = (
396            indices
397            or indexes
398            or self._attributes.get('parameters', {}).get('indices', None)
399            or self._attributes.get('parameters', {}).get('indexes', None)
400        )
401        if isinstance(indices, dict):
402            indices_key = (
403                'indexes'
404                if 'indexes' in self._attributes['parameters']
405                else 'indices'
406            )
407            self._attributes['parameters'][indices_key] = indices
408
409        if isinstance(tags, (list, tuple)):
410            self._attributes['parameters']['tags'] = tags
411        elif tags is not None:
412            warn(f"The provided tags are of invalid type '{type(tags)}'.")
413
414        if isinstance(target, str):
415            self._attributes['parameters']['target'] = target
416        elif target is not None:
417            warn(f"The provided target is of invalid type '{type(target)}'.")
418
419        if isinstance(dtypes, dict):
420            self._attributes['parameters']['dtypes'] = dtypes
421        elif dtypes is not None:
422            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
423
424        if isinstance(upsert, bool):
425            self._attributes['parameters']['upsert'] = upsert
426
427        if isinstance(autoincrement, bool):
428            self._attributes['parameters']['autoincrement'] = autoincrement
429
430        if isinstance(autotime, bool):
431            self._attributes['parameters']['autotime'] = autotime
432
433        if isinstance(precision, dict):
434            self._attributes['parameters']['precision'] = precision
435        elif isinstance(precision, str):
436            self._attributes['parameters']['precision'] = {'unit': precision}
437
438        if isinstance(static, bool):
439            self._attributes['parameters']['static'] = static
440            self._static = static
441
442        if isinstance(enforce, bool):
443            self._attributes['parameters']['enforce'] = enforce
444
445        if isinstance(null_indices, bool):
446            self._attributes['parameters']['null_indices'] = null_indices
447
448        if isinstance(mixed_numerics, bool):
449            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
450
451        ### NOTE: The parameters dictionary is {} by default.
452        ###       A Pipe may be registered without parameters, then edited,
453        ###       or a Pipe may be registered with parameters set in-memory first.
454        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
455        if _mrsm_instance is None:
456            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
457
458        if not isinstance(_mrsm_instance, str):
459            self._instance_connector = _mrsm_instance
460            self._instance_keys = str(_mrsm_instance)
461        else:
462            self._instance_keys = _mrsm_instance
463
464        if self._instance_keys == 'sql:memory':
465            self.cache = False
466
467        self._cache_locks = collections.defaultdict(lambda: threading.RLock())
468
469        if references is not None or reference is not None:
470            reference_vals = references if references is not None else reference
471            self.references = reference_vals
472
473        if parents is not None or parent is not None:
474            parent_vals = parents if parents is not None else parent
475            self.parents = parent_vals
476
477        if children is not None or child is not None:
478            children_vals = children if children is not None else child
479            self.children = children_vals
Parameters
  • connector (str): Keys for the pipe's source connector, e.g. 'sql:main'.
  • metric (str): Label for the pipe's contents, e.g. 'weather'.
  • location (str, default None): Label for the pipe's location. Defaults to None.
  • parameters (Optional[Dict[str, Any]], default None): Optionally set a pipe's parameters from the constructor, e.g. columns and other attributes. You can edit these parameters with edit pipes.
  • columns (Union[Dict[str, str], List[str], None], default None): Set the columns dictionary of parameters. If parameters is also provided, this dictionary is added under the 'columns' key.
  • indices (Optional[Dict[str, Union[str, List[str]]]], default None): Set the indices dictionary of parameters. If parameters is also provided, this dictionary is added under the 'indices' key.
  • tags (Optional[List[str]], default None): A list of strings to be added under the 'tags' key of parameters. You can select pipes with certain tags using --tags.
  • dtypes (Optional[Dict[str, str]], default None): Set the dtypes dictionary of parameters. If parameters is also provided, this dictionary is added under the 'dtypes' key.
  • mrsm_instance (Optional[Union[str, InstanceConnector]], default None): Connector for the Meerschaum instance where the pipe resides. Defaults to the preconfigured default instance ('sql:main').
  • instance (Optional[Union[str, InstanceConnector]], default None): Alias for mrsm_instance. If mrsm_instance is supplied, this value is ignored.
  • upsert (Optional[bool], default None): If True, set upsert to True in the parameters.
  • autoincrement (Optional[bool], default None): If True, set autoincrement in the parameters.
  • autotime (Optional[bool], default None): If True, set autotime in the parameters.
  • precision (Union[str, Dict[str, Union[str, int]], None], default None): If provided, set precision in the parameters. This may be either a string (the precision unit) or a dictionary of in the form {'unit': <unit>, 'interval': <interval>}. Default is determined by the datetime column dtype (e.g. datetime64[us] is microsecond precision).
  • static (Optional[bool], default None): If True, set static in the parameters.
  • enforce (Optional[bool], default None): If False, skip data type enforcement. Default behavior is True.
  • null_indices (Optional[bool], default None): Set to False if there will be no null values in the index columns. Defaults to True.
  • mixed_numerics (bool, default None): If True, integer columns will be converted to numeric when floats are synced. Set to False to disable this behavior. Defaults to True.
  • temporary (bool, default False): If True, prevent instance tables (pipes, users, plugins) from being created.
  • cache (Optional[bool], default None): If True, cache the pipe's metadata to disk (in addition to in-memory caching). If cache is not explicitly True, it is set to False if temporary is True. Defaults to True (from None).
  • cache_connector_keys (Optional[str], default None): If provided, use the keys to a Valkey connector (e.g. valkey:main).
  • references (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): If provided, inherit the parameters of the reference Pipe(s). May be equal to a string of the Pipe constructor, a dictionary of constructor keys, a Pipe itself, or a list of any of these values.
  • parents (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): Set references for parent pipes. See references for values.
  • children (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): Set references for child pipes. See references for values.
temporary
cache
cache_connector_keys
debug
metric_key: str
481    @property
482    def metric_key(self) -> str:
483        """
484        Return the pipe's metric key.
485        """
486        return self._metric_key

Return the pipe's metric key.

metric: str
488    @property
489    def metric(self) -> str:
490        """
491        Return the pipe's metric key.
492        """
493        return self._metric_key

Return the pipe's metric key.

location_key: Optional[str]
495    @property
496    def location_key(self) -> Union[str, None]:
497        """
498        Return the pipe's location key.
499        """
500        return self._location_key

Return the pipe's location key.

location: Optional[str]
502    @property
503    def location(self) -> Union[str, None]:
504        """
505        Return the pipe's location key.
506        """
507        return self._location_key

Return the pipe's location key.

meta
509    @property
510    def meta(self):
511        """
512        Return the four keys needed to reconstruct this pipe.
513        """
514        return {
515            'connector_keys': self.connector_keys,
516            'metric_key': self.metric_key,
517            'location_key': self.location_key,
518            'instance_keys': self.instance_keys,
519        }

Return the four keys needed to reconstruct this pipe.

def keys(self) -> List[str]:
521    def keys(self) -> List[str]:
522        """
523        Return the ordered keys for this pipe.
524        """
525        return {
526            key: val
527            for key, val in self.meta.items()
528            if key != 'instance'
529        }

Return the ordered keys for this pipe.

instance_keys: str
531    @property
532    def instance_keys(self) -> str:
533        """
534        Return the pipe's instance keys.
535        """
536        return self._instance_keys

Return the pipe's instance keys.

instance: Union[InstanceConnector, str]
538    @property
539    def instance(self) -> Union[InstanceConnector, str]:
540        """
541        Return the pipe's instance connector or keys.
542        """
543        conn = self.instance_connector
544        if conn is None:
545            return self.instance_keys
546        return conn

Return the pipe's instance connector or keys.

instance_connector: Optional[InstanceConnector]
548    @property
549    def instance_connector(self) -> Union[InstanceConnector, None]:
550        """
551        The instance connector on which this pipe resides.
552        """
553        if '_instance_connector' not in self.__dict__:
554            from meerschaum.connectors.parse import parse_instance_keys
555            conn = parse_instance_keys(self.instance_keys)
556            if conn:
557                self._instance_connector = conn
558            else:
559                return None
560        return self._instance_connector

The instance connector on which this pipe resides.

connector_keys: str
562    @property
563    def connector_keys(self) -> str:
564        """
565        Return the pipe's connector keys.
566        """
567        return self._connector_keys

Return the pipe's connector keys.

connector_key: str
569    @property
570    def connector_key(self) -> str:
571        """
572        Legacy: use `Pipe.connector_keys` instead.
573        """
574        return self.connector_keys

Legacy: use Pipe.connector_keys instead.

connector: "Union['Connector', str]"
576    @property
577    def connector(self) -> Union['Connector', str]:
578        """
579        The connector to the data source.
580        """
581        if '_connector' not in self.__dict__:
582            from meerschaum.connectors.parse import parse_instance_keys
583            import warnings
584            with warnings.catch_warnings():
585                warnings.simplefilter('ignore')
586                try:
587                    conn = parse_instance_keys(self.connector_keys)
588                except Exception:
589                    conn = None
590            if conn:
591                self._connector = conn
592            else:
593                return self._connector_keys
594        return self._connector

The connector to the data source.

def fetch( self, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, check_existing: bool = True, sync_chunks: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.DataFrame, Iterator[pandas.DataFrame], NoneType]:
21def fetch(
22    self,
23    begin: Union[datetime, int, str, None] = '',
24    end: Union[datetime, int, None] = None,
25    check_existing: bool = True,
26    sync_chunks: bool = False,
27    debug: bool = False,
28    **kw: Any
29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
30    """
31    Fetch a Pipe's latest data from its connector.
32
33    Parameters
34    ----------
35    begin: Union[datetime, str, None], default '':
36        If provided, only fetch data newer than or equal to `begin`.
37
38    end: Optional[datetime], default None:
39        If provided, only fetch data older than or equal to `end`.
40
41    check_existing: bool, default True
42        If `False`, do not apply the backtrack interval.
43
44    sync_chunks: bool, default False
45        If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching
46        loads chunks into memory.
47
48    debug: bool, default False
49        Verbosity toggle.
50
51    Returns
52    -------
53    A `pd.DataFrame` of the newest unseen data.
54
55    """
56    if 'fetch' not in dir(self.connector):
57        warn(f"No `fetch()` function defined for connector '{self.connector}'")
58        return None
59
60    from meerschaum.connectors import get_connector_plugin
61    from meerschaum.utils.misc import filter_arguments
62
63    _chunk_hook = kw.pop('chunk_hook', None)
64    kw['workers'] = self.get_num_workers(kw.get('workers', None))
65    if sync_chunks and _chunk_hook is None:
66
67        def _chunk_hook(chunk, **_kw) -> SuccessTuple:
68            """
69            Wrap `Pipe.sync()` with a custom chunk label prepended to the message.
70            """
71            from meerschaum.config._patch import apply_patch_to_config
72            kwargs = apply_patch_to_config(kw, _kw)
73            chunk_success, chunk_message = self.sync(chunk, **kwargs)
74            chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None))
75            if chunk_label:
76                chunk_message = '\n' + chunk_label + '\n' + chunk_message
77            return chunk_success, chunk_message
78
79    begin, end = self.parse_date_bounds(begin, end)
80
81    with mrsm.Venv(get_connector_plugin(self.connector)):
82        _args, _kwargs = filter_arguments(
83            self.connector.fetch,
84            self,
85            begin=_determine_begin(
86                self,
87                begin,
88                end,
89                check_existing=check_existing,
90                debug=debug,
91            ),
92            end=end,
93            chunk_hook=_chunk_hook,
94            debug=debug,
95            **kw
96        )
97        df = self.connector.fetch(*_args, **_kwargs)
98    return df

Fetch a Pipe's latest data from its connector.

Parameters
  • begin (Union[datetime, str, None], default '':): If provided, only fetch data newer than or equal to begin.
  • end (Optional[datetime], default None:): If provided, only fetch data older than or equal to end.
  • check_existing (bool, default True): If False, do not apply the backtrack interval.
  • sync_chunks (bool, default False): If True and the pipe's connector is of type 'sql', begin syncing chunks while fetching loads chunks into memory.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pd.DataFrame of the newest unseen data.
def get_backtrack_interval( self, check_existing: bool = True, debug: bool = False) -> Union[datetime.timedelta, int]:
101def get_backtrack_interval(
102    self,
103    check_existing: bool = True,
104    debug: bool = False,
105) -> Union[timedelta, int]:
106    """
107    Get the chunk interval to use for this pipe.
108
109    Parameters
110    ----------
111    check_existing: bool, default True
112        If `False`, return a backtrack_interval of 0 minutes.
113
114    Returns
115    -------
116    The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
117    """
118    from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES
119    default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes')
120    configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None)
121    backtrack_minutes = (
122        configured_backtrack_minutes
123        if configured_backtrack_minutes is not None
124        else default_backtrack_minutes
125    ) if check_existing else 0
126
127    dt_col = self.columns.get('datetime', None)
128    if dt_col is None:
129        return timedelta(minutes=backtrack_minutes)
130
131    dt_dtype = self.dtypes.get(dt_col, 'datetime')
132    if 'int' in dt_dtype.lower():
133        if not self.parameters.get('precision', None):
134            return backtrack_minutes
135        precision_unit = self.precision.get('unit', None)
136        true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
137        scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None)
138        if scalar is not None:
139            return int(backtrack_minutes * 60 * scalar)
140        return backtrack_minutes
141
142    return timedelta(minutes=backtrack_minutes)

Get the chunk interval to use for this pipe.

Parameters
  • check_existing (bool, default True): If False, return a backtrack_interval of 0 minutes.
Returns
  • The backtrack interval (timedelta or int) to use with this pipe's datetime axis.
def get_data( self, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, str, NoneType] = None, end: Union[datetime.datetime, int, str, NoneType] = None, params: Optional[Dict[str, Any]] = None, as_docs: bool = False, as_iterator: bool = False, as_chunks: bool = False, as_dask: bool = False, add_missing_columns: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, order: Optional[str] = 'asc', limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.DataFrame, Iterator[pandas.DataFrame], NoneType]:
 23def get_data(
 24    self,
 25    select_columns: Optional[List[str]] = None,
 26    omit_columns: Optional[List[str]] = None,
 27    begin: Union[datetime, int, str, None] = None,
 28    end: Union[datetime, int, str, None] = None,
 29    params: Optional[Dict[str, Any]] = None,
 30    as_docs: bool = False,
 31    as_iterator: bool = False,
 32    as_chunks: bool = False,
 33    as_dask: bool = False,
 34    add_missing_columns: bool = False,
 35    chunk_interval: Union[timedelta, int, None] = None,
 36    order: Optional[str] = 'asc',
 37    limit: Optional[int] = None,
 38    fresh: bool = False,
 39    debug: bool = False,
 40    **kw: Any
 41) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
 42    """
 43    Get a pipe's data from the instance connector.
 44
 45    Parameters
 46    ----------
 47    select_columns: Optional[List[str]], default None
 48        If provided, only select these given columns.
 49        Otherwise select all available columns (i.e. `SELECT *`).
 50
 51    omit_columns: Optional[List[str]], default None
 52        If provided, remove these columns from the selection.
 53
 54    begin: Union[datetime, int, str, None], default None
 55        Lower bound datetime to begin searching for data (inclusive).
 56        Translates to a `WHERE` clause like `WHERE datetime >= begin`.
 57        Defaults to `None`.
 58
 59    end: Union[datetime, int, str, None], default None
 60        Upper bound datetime to stop searching for data (inclusive).
 61        Translates to a `WHERE` clause like `WHERE datetime < end`.
 62        Defaults to `None`.
 63
 64    params: Optional[Dict[str, Any]], default None
 65        Filter the retrieved data by a dictionary of parameters.
 66        See `meerschaum.utils.sql.build_where` for more details. 
 67
 68    as_docs: bool, default False
 69        If `True`, return a list of dictionaries rather than a DataFrame.
 70        Relies on `get_pipe_docs` from the instance connector if implemented.
 71        May be combined with `as_chunks` to return an `Iterator[List[Dict]]`
 72        chunked by time bounds (useful for large result sets without pandas overhead).
 73
 74    as_iterator: bool, default False
 75        If `True`, return a generator of chunks of pipe data.
 76        When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames.
 77
 78    as_chunks: bool, default False
 79        Alias for `as_iterator`.
 80        When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames.
 81
 82    as_dask: bool, default False
 83        If `True`, return a `dask.DataFrame`
 84        (which may be loaded into a Pandas DataFrame with `df.compute()`).
 85
 86    add_missing_columns: bool, default False
 87        If `True`, add any missing columns from `Pipe.dtypes` to the dataframe.
 88
 89    chunk_interval: Union[timedelta, int, None], default None
 90        If `as_iterator`, then return chunks with `begin` and `end` separated by this interval.
 91        This may be set under `pipe.parameters['chunk_minutes']`.
 92        By default, use a timedelta of 1440 minutes (1 day).
 93        If `chunk_interval` is an integer and the `datetime` axis a timestamp,
 94        the use a timedelta with the number of minutes configured to this value.
 95        If the `datetime` axis is an integer, default to the configured chunksize.
 96        If `chunk_interval` is a `timedelta` and the `datetime` axis an integer,
 97        use the number of minutes in the `timedelta`.
 98
 99    order: Optional[str], default 'asc'
100        If `order` is not `None`, sort the resulting dataframe by indices.
101
102    limit: Optional[int], default None
103        If provided, cap the dataframe to this many rows.
104
105    fresh: bool, default False
106        If `True`, skip local cache and directly query the instance connector.
107
108    debug: bool, default False
109        Verbosity toggle.
110        Defaults to `False`.
111
112    Returns
113    -------
114    A `pd.DataFrame` of the pipe's data (default).
115    A `List[Dict]` if `as_docs=True`.
116    An `Iterator[pd.DataFrame]` if `as_chunks=True` (or `as_iterator=True`).
117    An `Iterator[List[Dict]]` if both `as_docs=True` and `as_chunks=True`.
118
119    """
120    from meerschaum.utils.warnings import warn
121    from meerschaum.utils.venv import Venv
122    from meerschaum.connectors import get_connector_plugin
123    from meerschaum.utils.dtypes import to_pandas_dtype
124    from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator
125    from meerschaum.utils.packages import attempt_import
126    from meerschaum.utils.warnings import dprint
127    dd = attempt_import('dask.dataframe') if as_dask else None
128    dask = attempt_import('dask') if as_dask else None
129    _ = attempt_import('partd', lazy=False) if as_dask else None
130
131    if select_columns == '*':
132        select_columns = None
133    elif isinstance(select_columns, str):
134        select_columns = [select_columns]
135
136    if isinstance(omit_columns, str):
137        omit_columns = [omit_columns]
138
139    begin, end = self.parse_date_bounds(begin, end, debug=debug)
140    as_iterator = as_iterator or as_chunks
141    dt_col = self.columns.get('datetime', None)
142
143    def _sort_df(_df):
144        if df_is_chunk_generator(_df):
145            return _df
146        indices = [] if dt_col not in _df.columns else [dt_col]
147        non_dt_cols = [
148            col
149            for col_ix, col in self.columns.items()
150            if col_ix != 'datetime' and col in _df.columns
151        ]
152        indices.extend(non_dt_cols)
153        if 'dask' not in _df.__module__:
154            _df.sort_values(
155                by=indices,
156                inplace=True,
157                ascending=(str(order).lower() == 'asc'),
158            )
159            _df.reset_index(drop=True, inplace=True)
160        else:
161            _df = _df.sort_values(
162                by=indices,
163                ascending=(str(order).lower() == 'asc'),
164            )
165            _df = _df.reset_index(drop=True)
166        if limit is not None and len(_df) > limit:
167            return _df.head(limit)
168        return _df
169
170    if as_iterator or as_chunks:
171        df = self._get_data_as_iterator(
172            select_columns=select_columns,
173            omit_columns=omit_columns,
174            begin=begin,
175            end=end,
176            params=params,
177            chunk_interval=chunk_interval,
178            limit=limit,
179            order=order,
180            as_docs=as_docs,
181            fresh=fresh,
182            debug=debug,
183        )
184        if as_docs:
185            return df
186        return _sort_df(df)
187
188    if as_dask:
189        from multiprocessing.pool import ThreadPool
190        dask_pool = ThreadPool(self.get_num_workers())
191        dask.config.set(pool=dask_pool)
192        chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
193        bounds = self.get_chunk_bounds(
194            begin=begin,
195            end=end,
196            bounded=False,
197            chunk_interval=chunk_interval,
198            debug=debug,
199        )
200        dask_chunks = [
201            dask.delayed(self.get_data)(
202                select_columns=select_columns,
203                omit_columns=omit_columns,
204                begin=chunk_begin,
205                end=chunk_end,
206                params=params,
207                chunk_interval=chunk_interval,
208                order=order,
209                limit=limit,
210                fresh=fresh,
211                add_missing_columns=True,
212                debug=debug,
213            )
214            for (chunk_begin, chunk_end) in bounds
215        ]
216        dask_meta = {
217            col: to_pandas_dtype(typ)
218            for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items()
219        }
220        if debug:
221            dprint(f"Dask meta:\n{dask_meta}")
222        return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta))
223
224    if not self.exists(debug=debug):
225        return [] if as_docs else None
226
227    if as_docs:
228        with Venv(get_connector_plugin(self.instance_connector)):
229            docs = self.instance_connector.get_pipe_docs(
230                pipe=self,
231                select_columns=select_columns,
232                omit_columns=omit_columns,
233                begin=begin,
234                end=end,
235                params=params,
236                limit=limit,
237                order=order,
238                debug=debug,
239                **kw
240            )
241        return docs if docs is not None else []
242
243    with Venv(get_connector_plugin(self.instance_connector)):
244        df = self.instance_connector.get_pipe_data(
245            pipe=self,
246            select_columns=select_columns,
247            omit_columns=omit_columns,
248            begin=begin,
249            end=end,
250            params=params,
251            limit=limit,
252            order=order,
253            debug=debug,
254            **kw
255        )
256        if df is None:
257            return df
258
259        if not select_columns:
260            select_columns = [col for col in df.columns]
261
262        pipe_dtypes = self.get_dtypes(refresh=False, debug=debug)
263        cols_to_omit = [
264            col
265            for col in df.columns
266            if (
267                col in (omit_columns or [])
268                or
269                col not in (select_columns or [])
270            )
271        ]
272        cols_to_add = [
273            col
274            for col in select_columns
275            if col not in df.columns
276        ] + ([
277            col
278            for col in pipe_dtypes
279            if col not in df.columns
280        ] if add_missing_columns else [])
281        if cols_to_omit:
282            warn(
283                (
284                    f"Received {len(cols_to_omit)} omitted column"
285                    + ('s' if len(cols_to_omit) != 1 else '')
286                    + f" for {self}. "
287                    + "Consider adding `select_columns` and `omit_columns` support to "
288                    + f"'{self.instance_connector.type}' connectors to improve performance."
289                ),
290                stack=False,
291            )
292            _cols_to_select = [col for col in df.columns if col not in cols_to_omit]
293            df = df[_cols_to_select]
294
295        if cols_to_add:
296            if not add_missing_columns:
297                from meerschaum.utils.misc import items_str
298                warn(
299                    f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.",
300                    stack=False,
301                )
302
303            df = add_missing_cols_to_df(
304                df,
305                {
306                    col: pipe_dtypes.get(col, 'string')
307                    for col in cols_to_add
308                },
309            )
310
311        enforced_df = self.enforce_dtypes(
312            df,
313            dtypes=pipe_dtypes,
314            debug=debug,
315        )
316
317        if order:
318            return _sort_df(enforced_df)
319        return enforced_df

Get a pipe's data from the instance connector.

Parameters
  • select_columns (Optional[List[str]], default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
  • begin (Union[datetime, int, str, None], default None): Lower bound datetime to begin searching for data (inclusive). Translates to a WHERE clause like WHERE datetime >= begin. Defaults to None.
  • end (Union[datetime, int, str, None], default None): Upper bound datetime to stop searching for data (inclusive). Translates to a WHERE clause like WHERE datetime < end. Defaults to None.
  • params (Optional[Dict[str, Any]], default None): Filter the retrieved data by a dictionary of parameters. See meerschaum.utils.sql.build_where for more details.
  • as_docs (bool, default False): If True, return a list of dictionaries rather than a DataFrame. Relies on get_pipe_docs from the instance connector if implemented. May be combined with as_chunks to return an Iterator[List[Dict]] chunked by time bounds (useful for large result sets without pandas overhead).
  • as_iterator (bool, default False): If True, return a generator of chunks of pipe data. When combined with as_docs=True, yields List[Dict] per chunk instead of DataFrames.
  • as_chunks (bool, default False): Alias for as_iterator. When combined with as_docs=True, yields List[Dict] per chunk instead of DataFrames.
  • as_dask (bool, default False): If True, return a dask.DataFrame (which may be loaded into a Pandas DataFrame with df.compute()).
  • add_missing_columns (bool, default False): If True, add any missing columns from Pipe.dtypes to the dataframe.
  • chunk_interval (Union[timedelta, int, None], default None): If as_iterator, then return chunks with begin and end separated by this interval. This may be set under pipe.parameters['chunk_minutes']. By default, use a timedelta of 1440 minutes (1 day). If chunk_interval is an integer and the datetime axis a timestamp, the use a timedelta with the number of minutes configured to this value. If the datetime axis is an integer, default to the configured chunksize. If chunk_interval is a timedelta and the datetime axis an integer, use the number of minutes in the timedelta.
  • order (Optional[str], default 'asc'): If order is not None, sort the resulting dataframe by indices.
  • limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
  • fresh (bool, default False): If True, skip local cache and directly query the instance connector.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
  • A pd.DataFrame of the pipe's data (default).
  • A List[Dict] if as_docs=True.
  • An Iterator[pd.DataFrame] if as_chunks=True (or as_iterator=True).
  • An Iterator[List[Dict]] if both as_docs=True and as_chunks=True.
def get_backtrack_data( self, backtrack_minutes: Optional[int] = None, begin: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Optional[pandas.DataFrame]:
414def get_backtrack_data(
415    self,
416    backtrack_minutes: Optional[int] = None,
417    begin: Union[datetime, int, None] = None,
418    params: Optional[Dict[str, Any]] = None,
419    limit: Optional[int] = None,
420    fresh: bool = False,
421    debug: bool = False,
422    **kw: Any
423) -> Optional['pd.DataFrame']:
424    """
425    Get the most recent data from the instance connector as a Pandas DataFrame.
426
427    Parameters
428    ----------
429    backtrack_minutes: Optional[int], default None
430        How many minutes from `begin` to select from.
431        If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`.
432
433    begin: Optional[datetime], default None
434        The starting point to search for data.
435        If begin is `None` (default), use the most recent observed datetime
436        (AKA sync_time).
437
438        ```
439        E.g. begin = 02:00
440
441        Search this region.           Ignore this, even if there's data.
442        /  /  /  /  /  /  /  /  /  |
443        -----|----------|----------|----------|----------|----------|
444        00:00      01:00      02:00      03:00      04:00      05:00
445
446        ```
447
448    params: Optional[Dict[str, Any]], default None
449        The standard Meerschaum `params` query dictionary.
450
451    limit: Optional[int], default None
452        If provided, cap the number of rows to be returned.
453
454    fresh: bool, default False
455        If `True`, Ignore local cache and pull directly from the instance connector.
456        Only comes into effect if a pipe was created with `cache=True`.
457
458    debug: bool default False
459        Verbosity toggle.
460
461    Returns
462    -------
463    A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data
464    is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
465    """
466    from meerschaum.utils.venv import Venv
467    from meerschaum.connectors import get_connector_plugin
468
469    if not self.exists(debug=debug):
470        return None
471
472    begin = self.parse_date_bounds(begin, debug=debug)
473
474    backtrack_interval = self.get_backtrack_interval(debug=debug)
475    if backtrack_minutes is None:
476        backtrack_minutes = (
477            (backtrack_interval.total_seconds() / 60)
478            if isinstance(backtrack_interval, timedelta)
479            else backtrack_interval
480        )
481
482    if hasattr(self.instance_connector, 'get_backtrack_data'):
483        with Venv(get_connector_plugin(self.instance_connector)):
484            return self.enforce_dtypes(
485                self.instance_connector.get_backtrack_data(
486                    pipe=self,
487                    begin=begin,
488                    backtrack_minutes=backtrack_minutes,
489                    params=params,
490                    limit=limit,
491                    debug=debug,
492                    **kw
493                ),
494                debug=debug,
495            )
496
497    if begin is None:
498        begin = self.get_sync_time(params=params, debug=debug)
499
500    backtrack_interval = (
501        timedelta(minutes=backtrack_minutes)
502        if isinstance(begin, datetime)
503        else backtrack_minutes
504    )
505    if begin is not None:
506        begin = begin - backtrack_interval
507
508    kw['order'] = kw.get('order', 'desc') or 'desc'
509    return self.get_data(
510        begin=begin,
511        params=params,
512        debug=debug,
513        limit=limit,
514        **kw
515    )

Get the most recent data from the instance connector as a Pandas DataFrame.

Parameters
  • backtrack_minutes (Optional[int], default None): How many minutes from begin to select from. If None, use pipe.parameters['fetch']['backtrack_minutes'].
  • begin (Optional[datetime], default None): The starting point to search for data. If begin is None (default), use the most recent observed datetime (AKA sync_time).

    E.g. begin = 02:00
    
    Search this region.           Ignore this, even if there's data.
    /  /  /  /  /  /  /  /  /  |
    -----|----------|----------|----------|----------|----------|
    00:00      01:00      02:00      03:00      04:00      05:00
    
    
  • params (Optional[Dict[str, Any]], default None): The standard Meerschaum params query dictionary.

  • limit (Optional[int], default None): If provided, cap the number of rows to be returned.
  • fresh (bool, default False): If True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created with cache=True.
  • debug (bool default False): Verbosity toggle.
Returns
  • A pd.DataFrame for the pipe's data corresponding to the provided parameters. Backtrack data
  • is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
def get_rowcount( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, remote: bool = False, debug: bool = False) -> int:
518def get_rowcount(
519    self,
520    begin: Union[datetime, int, None] = None,
521    end: Union[datetime, int, None] = None,
522    params: Optional[Dict[str, Any]] = None,
523    remote: bool = False,
524    debug: bool = False
525) -> int:
526    """
527    Get a Pipe's instance or remote rowcount.
528
529    Parameters
530    ----------
531    begin: Optional[datetime], default None
532        Count rows where datetime > begin.
533
534    end: Optional[datetime], default None
535        Count rows where datetime < end.
536
537    remote: bool, default False
538        Count rows from a pipe's remote source.
539        **NOTE**: This is experimental!
540
541    debug: bool, default False
542        Verbosity toggle.
543
544    Returns
545    -------
546    An `int` of the number of rows in the pipe corresponding to the provided parameters.
547    Returned 0 if the pipe does not exist.
548    """
549    from meerschaum.utils.warnings import warn
550    from meerschaum.utils.venv import Venv
551    from meerschaum.connectors import get_connector_plugin
552    from meerschaum.utils.misc import filter_keywords
553
554    begin, end = self.parse_date_bounds(begin, end, debug=debug)
555    connector = self.instance_connector if not remote else self.connector
556    try:
557        with Venv(get_connector_plugin(connector)):
558            if not hasattr(connector, 'get_pipe_rowcount'):
559                warn(
560                    f"Connectors of type '{connector.type}' "
561                    "do not implement `get_pipe_rowcount()`.",
562                    stack=False,
563                )
564                return 0
565            kwargs = filter_keywords(
566                connector.get_pipe_rowcount,
567                begin=begin,
568                end=end,
569                params=params,
570                remote=remote,
571                debug=debug,
572            )
573            if remote and 'remote' not in kwargs:
574                warn(
575                    f"Connectors of type '{connector.type}' do not support remote rowcounts.",
576                    stack=False,
577                )
578                return 0
579            rowcount = connector.get_pipe_rowcount(
580                self,
581                begin=begin,
582                end=end,
583                params=params,
584                remote=remote,
585                debug=debug,
586            )
587            if rowcount is None:
588                return 0
589            return rowcount
590    except AttributeError as e:
591        warn(e)
592        if remote:
593            return 0
594    warn(f"Failed to get a rowcount for {self}.")
595    return 0

Get a Pipe's instance or remote rowcount.

Parameters
  • begin (Optional[datetime], default None): Count rows where datetime > begin.
  • end (Optional[datetime], default None): Count rows where datetime < end.
  • remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
  • debug (bool, default False): Verbosity toggle.
Returns
  • An int of the number of rows in the pipe corresponding to the provided parameters.
  • Returned 0 if the pipe does not exist.
def get_doc(self, **kwargs) -> Optional[Dict[str, Any]]:
879def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]:
880    """
881    Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data().
882    Keywords arguments are passed to `Pipe.get_data()`.
883    """
884    from meerschaum.utils.warnings import warn
885    kwargs['limit'] = 1
886    kwargs['as_docs'] = True
887    try:
888        docs = self.get_data(**kwargs)
889        if not docs:
890            return None
891        return docs[0]
892    except Exception as e:
893        warn(f"Failed to read value from {self}:\n{e}", stack=False)
894        return None

Convenience function to return a single row as a dictionary (or None) from Pipe.get_data(). Keywords arguments are passed toPipe.get_data()`.

def get_docs(self, **kwargs) -> list[dict[str, typing.Any]]:
896def get_docs(self, **kwargs) -> list[dict[str, Any]]:
897    """
898    Convenience method to return a pipe's data as a list of dictionaries.
899    Relies on `get_pipe_docs` from the instance connector if implemented.
900    """
901    kwargs['as_docs'] = True
902    return self.get_data(**kwargs)

Convenience method to return a pipe's data as a list of dictionaries. Relies on get_pipe_docs from the instance connector if implemented.

def get_value( self, column: str, params: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any:
904def get_value(
905    self,
906    column: str,
907    params: Optional[Dict[str, Any]] = None,
908    **kwargs: Any
909) -> Any:
910    """
911    Convenience function to return a single value (or `None`) from `Pipe.get_data()`.
912    Keywords arguments are passed to `Pipe.get_data()`.
913    """
914    from meerschaum.utils.warnings import warn
915    kwargs['select_columns'] = [column]
916    kwargs['limit'] = 1
917    kwargs['as_docs'] = True
918    try:
919        docs = self.get_data(params=params, **kwargs)
920        if not docs:
921            return None
922        if column not in docs[0]:
923            raise ValueError(f"Column '{column}' was not included in the result set.")
924        return docs[0][column]
925    except Exception as e:
926        warn(f"Failed to read value from {self}:\n{e}", stack=False)
927        return None

Convenience function to return a single value (or None) from Pipe.get_data(). Keywords arguments are passed to Pipe.get_data().

def get_chunk_interval( self, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False) -> Union[datetime.timedelta, int]:
598def get_chunk_interval(
599    self,
600    chunk_interval: Union[timedelta, int, None] = None,
601    debug: bool = False,
602) -> Union[timedelta, int]:
603    """
604    Get the chunk interval to use for this pipe.
605
606    Parameters
607    ----------
608    chunk_interval: Union[timedelta, int, None], default None
609        If provided, coerce this value into the correct type.
610        For example, if the datetime axis is an integer, then
611        return the number of minutes.
612
613    Returns
614    -------
615    The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
616    """
617    from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES
618    default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes')
619    configured_chunk_minutes = self.parameters.get('verify', {}).get('chunk_minutes', None)
620    chunk_minutes = (
621        (configured_chunk_minutes or default_chunk_minutes)
622        if chunk_interval is None
623        else (
624            chunk_interval
625            if isinstance(chunk_interval, int)
626            else int(chunk_interval.total_seconds() / 60)
627        )
628    )
629
630    dt_col = self.columns.get('datetime', None)
631    if dt_col is None:
632        return timedelta(minutes=chunk_minutes)
633
634    dt_dtype = self.dtypes.get(dt_col, 'datetime')
635    if 'int' in dt_dtype.lower():
636        if chunk_interval is not None or not self.parameters.get('precision', None):
637            return chunk_minutes
638        precision_unit = self.precision.get('unit', None)
639        true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
640        scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None)
641        if scalar is not None:
642            return int(chunk_minutes * 60 * scalar)
643        return chunk_minutes
644
645    return timedelta(minutes=chunk_minutes)

Get the chunk interval to use for this pipe.

Parameters
  • chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type. For example, if the datetime axis is an integer, then return the number of minutes.
Returns
  • The chunk interval (timedelta or int) to use with this pipe's datetime axis.
def get_chunk_bounds( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, bounded: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False) -> List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]]:
648def get_chunk_bounds(
649    self,
650    begin: Union[datetime, int, None] = None,
651    end: Union[datetime, int, None] = None,
652    bounded: bool = False,
653    chunk_interval: Union[timedelta, int, None] = None,
654    debug: bool = False,
655) -> List[
656    Tuple[
657        Union[datetime, int, None],
658        Union[datetime, int, None],
659    ]
660]:
661    """
662    Return a list of datetime bounds for iterating over the pipe's `datetime` axis.
663
664    Parameters
665    ----------
666    begin: Union[datetime, int, None], default None
667        If provided, do not select less than this value.
668        Otherwise the first chunk will be unbounded.
669
670    end: Union[datetime, int, None], default None
671        If provided, do not select greater than or equal to this value.
672        Otherwise the last chunk will be unbounded.
673
674    bounded: bool, default False
675        If `True`, do not include `None` in the first chunk.
676
677    chunk_interval: Union[timedelta, int, None], default None
678        If provided, use this interval for the size of chunk boundaries.
679        The default value for this pipe may be set
680        under `pipe.parameters['verify']['chunk_minutes']`.
681
682    debug: bool, default False
683        Verbosity toggle.
684
685    Returns
686    -------
687    A list of chunk bounds (datetimes or integers).
688    If unbounded, the first and last chunks will include `None`.
689    """
690    from datetime import timedelta
691    from meerschaum.utils.dtypes import are_dtypes_equal
692    from meerschaum.utils.misc import interval_str
693    include_less_than_begin = not bounded and begin is None
694    include_greater_than_end = not bounded and end is None
695    if begin is None:
696        begin = self.get_sync_time(newest=False, debug=debug)
697    consolidate_end_chunk = False
698    if end is None:
699        end = self.get_sync_time(newest=True, debug=debug)
700        if end is not None and hasattr(end, 'tzinfo'):
701            end += timedelta(minutes=1)
702            consolidate_end_chunk = True
703        elif are_dtypes_equal(str(type(end)), 'int'):
704            end += 1
705            consolidate_end_chunk = True
706
707    if begin is None and end is None:
708        return [(None, None)]
709
710    begin, end = self.parse_date_bounds(begin, end, debug=debug)
711
712    if begin and end:
713        if begin >= end:
714            return (
715                [(begin, begin)]
716                if bounded
717                else [(begin, None)]
718            )
719        if end <= begin:
720            return (
721                [(end, end)]
722                if bounded
723                else [(None, begin)]
724            )
725
726    ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`.
727    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
728    
729    ### Build a list of tuples containing the chunk boundaries
730    ### so that we can sync multiple chunks in parallel.
731    ### Run `verify pipes --workers 1` to sync chunks in series.
732    chunk_bounds = []
733    begin_cursor = begin
734    num_chunks = 0
735    max_chunks = 1_000_000
736    while begin_cursor < end:
737        end_cursor = begin_cursor + chunk_interval
738        chunk_bounds.append((begin_cursor, end_cursor))
739        begin_cursor = end_cursor
740        num_chunks += 1
741        if num_chunks >= max_chunks:
742            raise ValueError(
743                f"Too many chunks of size '{interval_str(chunk_interval)}' "
744                f"between '{begin}' and '{end}'."
745            )
746
747    if num_chunks > 1 and consolidate_end_chunk:
748        last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2]
749        chunk_bounds = chunk_bounds[:-2]
750        chunk_bounds.append((second_last_bounds[0], last_bounds[1]))
751
752    ### The chunk interval might be too large.
753    if not chunk_bounds and end >= begin:
754        chunk_bounds = [(begin, end)]
755
756    ### Truncate the last chunk to the end timestamp.
757    if chunk_bounds[-1][1] > end:
758        chunk_bounds[-1] = (chunk_bounds[-1][0], end)
759
760    ### Pop the last chunk if its bounds are equal.
761    if chunk_bounds[-1][0] == chunk_bounds[-1][1]:
762        chunk_bounds = chunk_bounds[:-1]
763
764    if include_less_than_begin:
765        chunk_bounds = [(None, begin)] + chunk_bounds
766    if include_greater_than_end:
767        chunk_bounds = chunk_bounds + [(end, None)]
768
769    return chunk_bounds

Return a list of datetime bounds for iterating over the pipe's datetime axis.

Parameters
  • begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
  • end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
  • bounded (bool, default False): If True, do not include None in the first chunk.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this interval for the size of chunk boundaries. The default value for this pipe may be set under pipe.parameters['verify']['chunk_minutes'].
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of chunk bounds (datetimes or integers).
  • If unbounded, the first and last chunks will include None.
def get_chunk_bounds_batches( self, chunk_bounds: List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]], batchsize: Optional[int] = None, workers: Optional[int] = None, debug: bool = False) -> List[Tuple[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]], ...]]:
772def get_chunk_bounds_batches(
773    self,
774    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]],
775    batchsize: Optional[int] = None,
776    workers: Optional[int] = None,
777    debug: bool = False,
778) -> List[
779    Tuple[
780        Tuple[
781            Union[datetime, int, None],
782            Union[datetime, int, None],
783        ], ...
784    ]
785]:
786    """
787    Return a list of tuples of chunk bounds of size `batchsize`.
788
789    Parameters
790    ----------
791    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]
792        A list of chunk_bounds (see `Pipe.get_chunk_bounds()`).
793
794    batchsize: Optional[int], default None
795        How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`.
796
797    workers: Optional[int], default None
798        If `batchsize` is `None`, use this as the desired number of workers.
799        Passed to `Pipe.get_num_workers()`.
800
801    Returns
802    -------
803    A list of tuples of chunk bound tuples.
804    """
805    from meerschaum.utils.misc import iterate_chunks
806    
807    if batchsize is None:
808        batchsize = self.get_num_workers(workers=workers)
809
810    return [
811        tuple(
812            _batch_chunk_bounds
813            for _batch_chunk_bounds in batch
814            if _batch_chunk_bounds is not None
815        )
816        for batch in iterate_chunks(chunk_bounds, batchsize)
817        if batch
818    ]

Return a list of tuples of chunk bounds of size batchsize.

Parameters
  • chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]): A list of chunk_bounds (see Pipe.get_chunk_bounds()).
  • batchsize (Optional[int], default None): How many chunks to include in a batch. Defaults to Pipe.get_num_workers().
  • workers (Optional[int], default None): If batchsize is None, use this as the desired number of workers. Passed to Pipe.get_num_workers().
Returns
  • A list of tuples of chunk bound tuples.
def parse_date_bounds( self, *dt_vals: Union[datetime.datetime, int, NoneType], debug: bool = False) -> Union[datetime.datetime, int, str, NoneType, Tuple[Union[datetime.datetime, int, str, NoneType]]]:
821def parse_date_bounds(self, *dt_vals: Union[datetime, int, None], debug: bool = False) -> Union[
822    datetime,
823    int,
824    str,
825    None,
826    Tuple[Union[datetime, int, str, None]]
827]:
828    """
829    Given a date bound (begin, end), coerce a timezone if necessary.
830    """
831    from meerschaum.utils.misc import is_int
832    from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES
833    from meerschaum.utils.warnings import warn
834    dateutil_parser = mrsm.attempt_import('dateutil.parser')
835
836    _columns = None
837    _dtypes = None
838
839    def _get_coercion_info():
840        nonlocal _columns, _dtypes
841        if _columns is None:
842            _columns = self.get_parameters(debug=debug).get('columns', {}) or {}
843        if _dtypes is None:
844            _dtypes = self.get_dtypes(debug=debug)
845
846    def _parse_date_bound(dt_val):
847        if dt_val is None:
848            return None
849
850        if isinstance(dt_val, int):
851            return dt_val
852
853        if dt_val == '':
854            return ''
855
856        if is_int(dt_val):
857            return int(dt_val)
858
859        if isinstance(dt_val, str):
860            try:
861                dt_val = dateutil_parser.parse(dt_val)
862            except Exception as e:
863                warn(f"Could not parse '{dt_val}' as datetime:\n{e}")
864                return None
865
866        _get_coercion_info()
867        dt_col = _columns.get('datetime', None)
868        dt_typ = str(_dtypes.get(dt_col, 'datetime'))
869        if dt_typ == 'datetime':
870            dt_typ = MRSM_PD_DTYPES['datetime']
871        return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower()))
872
873    bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals)
874    if len(bounds) == 1:
875        return bounds[0]
876    return bounds

Given a date bound (begin, end), coerce a timezone if necessary.

def register(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
12def register(
13    self,
14    debug: bool = False,
15    **kw: Any
16) -> SuccessTuple:
17    """
18    Register a new Pipe along with its attributes.
19
20    Parameters
21    ----------
22    debug: bool, default False
23        Verbosity toggle.
24
25    kw: Any
26        Keyword arguments to pass to `instance_connector.register_pipe()`.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31    """
32    if self.temporary:
33        return False, "Cannot register pipes created with `temporary=True` (read-only)."
34
35    from meerschaum.utils.formatting import get_console
36    from meerschaum.utils.venv import Venv
37    from meerschaum.connectors import get_connector_plugin, custom_types
38    from meerschaum.config._patch import apply_patch_to_config
39
40    import warnings
41    with warnings.catch_warnings():
42        warnings.simplefilter('ignore')
43        try:
44            _conn = self.connector
45        except Exception:
46            _conn = None
47
48        if isinstance(_conn, str):
49            _conn = None
50
51    if (
52        _conn is not None
53        and
54        (_conn.type == 'plugin' or _conn.type in custom_types)
55        and
56        getattr(_conn, 'register', None) is not None
57    ):
58        try:
59            with Venv(get_connector_plugin(_conn), debug=debug):
60                params = self.connector.register(self)
61        except Exception:
62            get_console().print_exception()
63            params = None
64        params = {} if params is None else params
65        if not isinstance(params, dict):
66            from meerschaum.utils.warnings import warn
67            warn(
68                f"Invalid parameters returned from `register()` in connector {self.connector}:\n"
69                + f"{params}"
70            )
71        else:
72            self.parameters = apply_patch_to_config(params, self.parameters)
73
74    if not self.parameters:
75        cols = self.columns if self.columns else {'datetime': None, 'id': None}
76        self.parameters = {
77            'columns': cols,
78        }
79
80    with Venv(get_connector_plugin(self.instance_connector)):
81        return self.instance_connector.register_pipe(self, debug=debug, **kw)

Register a new Pipe along with its attributes.

Parameters
  • debug (bool, default False): Verbosity toggle.
  • kw (Any): Keyword arguments to pass to instance_connector.register_pipe().
Returns
attributes: Dict[str, Any]
20@property
21def attributes(self) -> Dict[str, Any]:
22    """
23    Return a dictionary of a pipe's keys and parameters.
24    These values are reflected directly from the pipes table of the instance.
25    """
26    from meerschaum.config import get_config
27    from meerschaum.config._patch import apply_patch_to_config
28    from meerschaum.utils.venv import Venv
29    from meerschaum.connectors import get_connector_plugin
30    from meerschaum.utils.dtypes import get_current_timestamp
31
32    timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds')
33
34    now = get_current_timestamp('ms', as_int=True) / 1000
35    _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug)
36    timed_out = (
37        _attributes_sync_time is None
38        or
39        (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds)
40    )
41    if not self.temporary and timed_out:
42        self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug)
43        local_attributes = self._get_cached_value('attributes', debug=self.debug) or {}
44        with Venv(get_connector_plugin(self.instance_connector)):
45            instance_attributes = self.instance_connector.get_pipe_attributes(self)
46
47        self._cache_value(
48            'attributes',
49            apply_patch_to_config(instance_attributes, local_attributes),
50            memory_only=True,
51            debug=self.debug,
52        )
53
54    return self._attributes

Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.

parameters: Optional[Dict[str, Any]]
147@property
148def parameters(self) -> Optional[Dict[str, Any]]:
149    """
150    Return the parameters dictionary of the pipe.
151    """
152    return self.get_parameters(debug=self.debug)

Return the parameters dictionary of the pipe.

columns: Optional[Dict[str, str]]
164@property
165def columns(self) -> Union[Dict[str, str], None]:
166    """
167    Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`.
168    """
169    cols = self.parameters.get('columns', {})
170    if not isinstance(cols, dict):
171        return {}
172    return {col_ix: col for col_ix, col in cols.items() if col and col_ix}

Return the columns dictionary defined in meerschaum.Pipe.parameters.

indices: Optional[Dict[str, Union[str, List[str]]]]
189@property
190def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]:
191    """
192    Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`.
193    """
194    _parameters = self.get_parameters(debug=self.debug)
195    indices_key = (
196        'indexes'
197        if 'indexes' in _parameters
198        else 'indices'
199    )
200
201    _indices = _parameters.get(indices_key, {})
202    _columns = self.columns
203    dt_col = _columns.get('datetime', None)
204    if not isinstance(_indices, dict):
205        _indices = {}
206    unique_cols = list(set((
207        [dt_col]
208        if dt_col
209        else []
210    ) + [
211        col
212        for col_ix, col in _columns.items()
213        if col and col_ix != 'datetime'
214    ]))
215    return {
216        **({'unique': unique_cols} if len(unique_cols) > 1 else {}),
217        **{col_ix: col for col_ix, col in _columns.items() if col},
218        **_indices
219    }

Return the indices dictionary defined in meerschaum.Pipe.parameters.

indexes: Optional[Dict[str, Union[str, List[str]]]]
222@property
223def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]:
224    """
225    Alias for `meerschaum.Pipe.indices`.
226    """
227    return self.indices
dtypes: Dict[str, Any]
278@property
279def dtypes(self) -> Dict[str, Any]:
280    """
281    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
282    """
283    return self.get_dtypes(refresh=False, debug=self.debug)

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

autoincrement: bool
386@property
387def autoincrement(self) -> bool:
388    """
389    Return the `autoincrement` parameter for the pipe.
390    """
391    return self.parameters.get('autoincrement', False)

Return the autoincrement parameter for the pipe.

autotime: bool
402@property
403def autotime(self) -> bool:
404    """
405    Return the `autotime` parameter for the pipe.
406    """
407    return self.parameters.get('autotime', False)

Return the autotime parameter for the pipe.

upsert: bool
353@property
354def upsert(self) -> bool:
355    """
356    Return whether `upsert` is set for the pipe.
357    """
358    return self.parameters.get('upsert', False)

Return whether upsert is set for the pipe.

static: bool
369@property
370def static(self) -> bool:
371    """
372    Return whether `static` is set for the pipe.
373    """
374    return self.parameters.get('static', False)

Return whether static is set for the pipe.

tzinfo: Optional[datetime.timezone]
418@property
419def tzinfo(self) -> Union[None, timezone]:
420    """
421    Return `timezone.utc` if the pipe is timezone-aware.
422    """
423    _tzinfo = self._get_cached_value('tzinfo', debug=self.debug)
424    if _tzinfo is not None:
425        return _tzinfo if _tzinfo != 'None' else None
426
427    _tzinfo = None
428    dt_col = self.columns.get('datetime', None)
429    dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None
430    if self.autotime:
431        ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
432        ts_typ = self.dtypes.get(ts_col, 'datetime')
433        dt_typ = ts_typ
434
435    if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime':
436        _tzinfo = timezone.utc
437
438    self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug)
439    return _tzinfo

Return timezone.utc if the pipe is timezone-aware.

enforce: bool
442@property
443def enforce(self) -> bool:
444    """
445    Return the `enforce` parameter for the pipe.
446    """
447    return self.parameters.get('enforce', True)

Return the enforce parameter for the pipe.

null_indices: bool
458@property
459def null_indices(self) -> bool:
460    """
461    Return the `null_indices` parameter for the pipe.
462    """
463    return self.parameters.get('null_indices', True)

Return the null_indices parameter for the pipe.

mixed_numerics: bool
474@property
475def mixed_numerics(self) -> bool:
476    """
477    Return the `mixed_numerics` parameter for the pipe.
478    """
479    return self.parameters.get('mixed_numerics', True)

Return the mixed_numerics parameter for the pipe.

def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
490def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
491    """
492    Check if the requested columns are defined.
493
494    Parameters
495    ----------
496    *args: str
497        The column names to be retrieved.
498
499    error: bool, default False
500        If `True`, raise an `Exception` if the specified column is not defined.
501
502    Returns
503    -------
504    A tuple of the same size of `args` or a `str` if `args` is a single argument.
505
506    Examples
507    --------
508    >>> pipe = mrsm.Pipe('test', 'test')
509    >>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
510    >>> pipe.get_columns('datetime', 'id')
511    ('dt', 'id')
512    >>> pipe.get_columns('value', error=True)
513    Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
514    """
515    from meerschaum.utils.warnings import error as _error
516    if not args:
517        args = tuple(self.columns.keys())
518    col_names = []
519    for col in args:
520        col_name = None
521        try:
522            col_name = self.columns[col]
523            if col_name is None and error:
524                _error(f"Please define the name of the '{col}' column for {self}.")
525        except Exception:
526            col_name = None
527        if col_name is None and error:
528            _error(f"Missing '{col}'" + f" column for {self}.")
529        col_names.append(col_name)
530    if len(col_names) == 1:
531        return col_names[0]
532    return tuple(col_names)

Check if the requested columns are defined.

Parameters
  • *args (str): The column names to be retrieved.
  • error (bool, default False): If True, raise an Exception if the specified column is not defined.
Returns
  • A tuple of the same size of args or a str if args is a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
def get_columns_types( self, refresh: bool = False, debug: bool = False) -> Optional[Dict[str, str]]:
535def get_columns_types(
536    self,
537    refresh: bool = False,
538    debug: bool = False,
539) -> Union[Dict[str, str], None]:
540    """
541    Get a dictionary of a pipe's column names and their types.
542
543    Parameters
544    ----------
545    refresh: bool, default False
546        If `True`, invalidate the cache and fetch directly from the instance connector.
547
548    debug: bool, default False:
549        Verbosity toggle.
550
551    Returns
552    -------
553    A dictionary of column names (`str`) to column types (`str`).
554
555    Examples
556    --------
557    >>> pipe.get_columns_types()
558    {
559      'dt': 'TIMESTAMP WITH TIMEZONE',
560      'id': 'BIGINT',
561      'val': 'DOUBLE PRECISION',
562    }
563    >>>
564    """
565    from meerschaum.connectors import get_connector_plugin
566    from meerschaum.utils.dtypes import get_current_timestamp
567
568    now = get_current_timestamp('ms', as_int=True) / 1000
569    cache_seconds = (
570        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
571        if self.static
572        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
573    )
574    if refresh:
575        self._clear_cache_key('_columns_types_timestamp', debug=debug)
576        self._clear_cache_key('_columns_types', debug=debug)
577
578    _columns_types = self._get_cached_value('_columns_types', debug=debug)
579    if _columns_types:
580        columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug)
581        if columns_types_timestamp is not None:
582            delta = now - columns_types_timestamp
583            if delta < cache_seconds:
584                if debug:
585                    dprint(
586                        f"Returning cached `columns_types` for {self} "
587                        f"({round(delta, 2)} seconds old)."
588                    )
589                return _columns_types
590
591    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
592        _columns_types = (
593            self.instance_connector.get_pipe_columns_types(self, debug=debug)
594            if hasattr(self.instance_connector, 'get_pipe_columns_types')
595            else None
596        )
597
598    self._cache_value('_columns_types', _columns_types, debug=debug)
599    self._cache_value('_columns_types_timestamp', now, debug=debug)
600    return _columns_types or {}

Get a dictionary of a pipe's column names and their types.

Parameters
  • refresh (bool, default False): If True, invalidate the cache and fetch directly from the instance connector.
  • debug (bool, default False:): Verbosity toggle.
Returns
  • A dictionary of column names (str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
  'dt': 'TIMESTAMP WITH TIMEZONE',
  'id': 'BIGINT',
  'val': 'DOUBLE PRECISION',
}
>>>
def get_columns_indices( self, debug: bool = False, refresh: bool = False) -> Dict[str, List[Dict[str, str]]]:
603def get_columns_indices(
604    self,
605    debug: bool = False,
606    refresh: bool = False,
607) -> Dict[str, List[Dict[str, str]]]:
608    """
609    Return a dictionary mapping columns to index information.
610    """
611    from meerschaum.connectors import get_connector_plugin
612    from meerschaum.utils.dtypes import get_current_timestamp
613
614    now = get_current_timestamp('ms', as_int=True) / 1000
615    cache_seconds = (
616        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
617        if self.static
618        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
619    )
620    if refresh:
621        self._clear_cache_key('_columns_indices_timestamp', debug=debug)
622        self._clear_cache_key('_columns_indices', debug=debug)
623
624    _columns_indices = self._get_cached_value('_columns_indices', debug=debug)
625
626    if _columns_indices:
627        columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug)
628        if columns_indices_timestamp is not None:
629            delta = now - columns_indices_timestamp
630            if delta < cache_seconds:
631                if debug:
632                    dprint(
633                        f"Returning cached `columns_indices` for {self} "
634                        f"({round(delta, 2)} seconds old)."
635                    )
636                return _columns_indices
637
638    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
639        _columns_indices = (
640            self.instance_connector.get_pipe_columns_indices(self, debug=debug)
641            if hasattr(self.instance_connector, 'get_pipe_columns_indices')
642            else None
643        )
644
645    self._cache_value('_columns_indices', _columns_indices, debug=debug)
646    self._cache_value('_columns_indices_timestamp', now, debug=debug)
647    return {k: v for k, v in _columns_indices.items() if k and v} or {}

Return a dictionary mapping columns to index information.

def get_indices(self) -> Dict[str, str]:
1047def get_indices(self) -> Dict[str, str]:
1048    """
1049    Return a dictionary mapping index keys to their names in the database.
1050
1051    Returns
1052    -------
1053    A dictionary of index keys to index names.
1054    """
1055    from meerschaum.connectors import get_connector_plugin
1056    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
1057        if hasattr(self.instance_connector, 'get_pipe_index_names'):
1058            result = self.instance_connector.get_pipe_index_names(self)
1059        else:
1060            result = {}
1061    
1062    return result

Return a dictionary mapping index keys to their names in the database.

Returns
  • A dictionary of index keys to index names.
def get_parameters( self, apply_symlinks: bool = True, refresh: bool = False, debug: bool = False, _visited: Optional[set[Pipe]] = None) -> Dict[str, Any]:
 59def get_parameters(
 60    self,
 61    apply_symlinks: bool = True,
 62    refresh: bool = False,
 63    debug: bool = False,
 64    _visited: 'Optional[set[mrsm.Pipe]]' = None,
 65) -> Dict[str, Any]:
 66    """
 67    Return the `parameters` dictionary of the pipe.
 68
 69    Parameters
 70    ----------
 71    apply_symlinks: bool, default True
 72        If `True`, resolve references to parameters from other pipes.
 73
 74    refresh: bool, default False
 75        If `True`, pull the latest attributes for the pipe.
 76
 77    Returns
 78    -------
 79    The pipe's parameters dictionary.
 80    """
 81    from meerschaum.config._patch import apply_patch_to_config
 82    from meerschaum.config._read_config import search_and_substitute_config
 83
 84    if _visited is None:
 85        _visited = {self}
 86
 87    if refresh:
 88        _ = self._invalidate_cache(hard=True)
 89
 90    raw_parameters = self.attributes.get('parameters', {})
 91    if not apply_symlinks:
 92        return raw_parameters
 93
 94    parameters = {}
 95    for ref_pipe in self.references:
 96        try:
 97            if ref_pipe in _visited:
 98                warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.")
 99                return search_and_substitute_config(raw_parameters)
100
101            _visited.add(ref_pipe)
102            if refresh:
103                _ = _cached_base_params.pop(ref_pipe, None)
104            base_params = _cached_base_params.get(ref_pipe, None)
105            if base_params is None:
106                base_params = ref_pipe.get_parameters(
107                    apply_symlinks=apply_symlinks,
108                    _visited=_visited,
109                    debug=debug,
110                )
111                _cached_base_params[ref_pipe] = base_params
112                if debug:
113                    dprint(f"base_params from {ref_pipe} for {self}:")
114                    mrsm.pprint(base_params)
115            else:
116                if debug:
117                    dprint(f"Using cached base_params from {ref_pipe} for {self}")
118        except Exception as e:
119            warn(f"Failed to resolve reference pipe for {self}: {e}")
120            base_params = {}
121
122        parameters = apply_patch_to_config(parameters, base_params)
123
124    parameters = apply_patch_to_config(parameters, raw_parameters)
125
126    from meerschaum.utils.pipes import replace_pipes_syntax
127    self._symlinks = {}
128
129    def recursive_replace(obj: Any, path: tuple) -> Any:
130        if isinstance(obj, dict):
131            return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()}
132        if isinstance(obj, list):
133            return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)]
134        if isinstance(obj, str):
135            substituted_val = replace_pipes_syntax(obj, _pipe=self)
136            if substituted_val != obj:
137                self._symlinks[path] = {
138                    'original': obj,
139                    'substituted': substituted_val,
140                }
141            return substituted_val
142        return obj
143
144    return search_and_substitute_config(recursive_replace(parameters, tuple()))

Return the parameters dictionary of the pipe.

Parameters
  • apply_symlinks (bool, default True): If True, resolve references to parameters from other pipes.
  • refresh (bool, default False): If True, pull the latest attributes for the pipe.
Returns
  • The pipe's parameters dictionary.
def get_dtypes( self, infer: bool = True, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
297def get_dtypes(
298    self,
299    infer: bool = True,
300    refresh: bool = False,
301    debug: bool = False,
302) -> Dict[str, Any]:
303    """
304    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
305
306    Parameters
307    ----------
308    infer: bool, default True
309        If `True`, include the implicit existing dtypes.
310        Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`).
311
312    refresh: bool, default False
313        If `True`, invalidate any cache and return the latest known dtypes.
314
315    Returns
316    -------
317    A dictionary mapping column names to dtypes.
318    """
319    from meerschaum.config._patch import apply_patch_to_config
320    from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES
321    parameters = self.get_parameters(refresh=refresh, debug=debug)
322    configured_dtypes = parameters.get('dtypes', {})
323    if debug:
324        dprint(f"Configured dtypes for {self}:")
325        mrsm.pprint(configured_dtypes)
326
327    remote_dtypes = (
328        self.infer_dtypes(persist=False, refresh=refresh, debug=debug)
329        if infer
330        else {}
331    )
332    if debug and infer:
333        dprint(f"Remote dtypes for {self}:")
334        mrsm.pprint(remote_dtypes)
335
336    patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {}))
337
338    dt_col = parameters.get('columns', {}).get('datetime', None)
339    primary_col = parameters.get('columns', {}).get('primary', None)
340    _dtypes = {
341        col: MRSM_ALIAS_DTYPES.get(typ, typ)
342        for col, typ in patched_dtypes.items()
343        if col and typ
344    }
345    if dt_col and dt_col not in configured_dtypes:
346        _dtypes[dt_col] = 'datetime'
347    if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes:
348        _dtypes[primary_col] = 'int'
349
350    return _dtypes

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

Parameters
  • infer (bool, default True): If True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g. Pipe.parameters['dtypes']).
  • refresh (bool, default False): If True, invalidate any cache and return the latest known dtypes.
Returns
  • A dictionary mapping column names to dtypes.
def update_parameters( self, parameters_patch: Dict[str, Any], persist: bool = True, debug: bool = False) -> Tuple[bool, str]:
1065def update_parameters(
1066    self,
1067    parameters_patch: Dict[str, Any],
1068    persist: bool = True,
1069    debug: bool = False,
1070) -> mrsm.SuccessTuple:
1071    """
1072    Apply a patch to a pipe's `parameters` dictionary.
1073
1074    Parameters
1075    ----------
1076    parameters_patch: Dict[str, Any]
1077        The patch to be applied to `Pipe.parameters`.
1078
1079    persist: bool, default True
1080        If `True`, call `Pipe.edit()` to persist the new parameters.
1081    """
1082    from meerschaum.config import apply_patch_to_config
1083    if 'parameters' not in self._attributes:
1084        self._attributes['parameters'] = {}
1085
1086    self._attributes['parameters'] = apply_patch_to_config(
1087        self._attributes['parameters'],
1088        parameters_patch,
1089    )
1090
1091    if self.temporary:
1092        persist = False
1093
1094    if not persist:
1095        return True, "Success"
1096
1097    return self.edit(debug=debug)

Apply a patch to a pipe's parameters dictionary.

Parameters
  • parameters_patch (Dict[str, Any]): The patch to be applied to Pipe.parameters.
  • persist (bool, default True): If True, call Pipe.edit() to persist the new parameters.
tags: Optional[List[str]]
255@property
256def tags(self) -> Union[List[str], None]:
257    """
258    If defined, return the `tags` list defined in `meerschaum.Pipe.parameters`.
259    """
260    return self.parameters.get('tags', [])

If defined, return the tags list defined in meerschaum.Pipe.parameters.

def get_id(self, **kw: Any) -> Union[int, str, NoneType]:
650def get_id(self, **kw: Any) -> Union[int, str, None]:
651    """
652    Fetch a pipe's ID from its instance connector.
653    If the pipe is not registered, return `None`.
654    """
655    if self.temporary:
656        return None
657
658    from meerschaum.utils.venv import Venv
659    from meerschaum.connectors import get_connector_plugin
660
661    with Venv(get_connector_plugin(self.instance_connector)):
662        if hasattr(self.instance_connector, 'get_pipe_id'):
663            return self.instance_connector.get_pipe_id(self, **kw)
664
665    return None

Fetch a pipe's ID from its instance connector. If the pipe is not registered, return None.

id: Union[int, str, uuid.UUID, NoneType]
668@property
669def id(self) -> Union[int, str, uuid.UUID, None]:
670    """
671    Fetch and cache a pipe's ID.
672    """
673    _id = self._get_cached_value('_id', debug=self.debug)
674    if _id is None:
675        _id = self.get_id(debug=self.debug)
676        if _id is not None:
677            self._cache_value('_id', _id, debug=self.debug)
678    return _id

Fetch and cache a pipe's ID.

def get_val_column(self, debug: bool = False) -> Optional[str]:
681def get_val_column(self, debug: bool = False) -> Union[str, None]:
682    """
683    Return the name of the value column if it's defined, otherwise make an educated guess.
684    If not set in the `columns` dictionary, return the first numeric column that is not
685    an ID or datetime column.
686    If none may be found, return `None`.
687
688    Parameters
689    ----------
690    debug: bool, default False:
691        Verbosity toggle.
692
693    Returns
694    -------
695    Either a string or `None`.
696    """
697    if debug:
698        dprint('Attempting to determine the value column...')
699    try:
700        val_name = self.get_columns('value')
701    except Exception:
702        val_name = None
703    if val_name is not None:
704        if debug:
705            dprint(f"Value column: {val_name}")
706        return val_name
707
708    cols = self.columns
709    if cols is None:
710        if debug:
711            dprint('No columns could be determined. Returning...')
712        return None
713    try:
714        dt_name = self.get_columns('datetime', error=False)
715    except Exception:
716        dt_name = None
717    try:
718        id_name = self.get_columns('id', errors=False)
719    except Exception:
720        id_name = None
721
722    if debug:
723        dprint(f"dt_name: {dt_name}")
724        dprint(f"id_name: {id_name}")
725
726    cols_types = self.get_columns_types(debug=debug)
727    if cols_types is None:
728        return None
729    if debug:
730        dprint(f"cols_types: {cols_types}")
731    if dt_name is not None:
732        cols_types.pop(dt_name, None)
733    if id_name is not None:
734        cols_types.pop(id_name, None)
735
736    candidates = []
737    candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',}
738    for search_term in candidate_keywords:
739        for col, typ in cols_types.items():
740            if search_term in typ.lower():
741                candidates.append(col)
742                break
743    if not candidates:
744        if debug:
745            dprint("No value column could be determined.")
746        return None
747
748    return candidates[0]

Return the name of the value column if it's defined, otherwise make an educated guess. If not set in the columns dictionary, return the first numeric column that is not an ID or datetime column. If none may be found, return None.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
  • Either a string or None.
parents: List[Pipe]
751@property
752def parents(self) -> List[mrsm.Pipe]:
753    """
754    Return a list of `meerschaum.Pipe` objects to be designated as parents.
755    """
756    _cached_parents = self.__dict__.get('_parents', None)
757    if _cached_parents is not None:
758        return _cached_parents
759
760    from meerschaum.utils.pipes import get_pipe_from_value
761    base_params = self.get_parameters()
762    key = 'parents' if 'parents' in base_params else 'parent'
763    parents_refs = base_params.get(key, None) or []
764    if isinstance(parents_refs, str) or isinstance(parents_refs, dict):
765        parents_refs = [parents_refs]
766
767    if not parents_refs:
768        return []
769
770    self._parents = [get_pipe_from_value(val, _pipe=self) for val in parents_refs]
771    return self._parents

Return a list of meerschaum.Pipe objects to be designated as parents.

parent: Optional[Pipe]
774@property
775def parent(self) -> Union[mrsm.Pipe, None]:
776    """
777    Return the first pipe in `self.parents` or `None`.
778    """
779    _parents = self.parents
780    if not _parents:
781        return None
782
783    return _parents[0]

Return the first pipe in self.parents or None.

children: List[Pipe]
819@property
820def children(self) -> List[mrsm.Pipe]:
821    """
822    Return a list of `meerschaum.Pipe` objects to be designated as children.
823    """
824    _cached_children = self.__dict__.get('_children', None)
825    if _cached_children is not None:
826        return _cached_children
827
828    from meerschaum.utils.pipes import get_pipe_from_value
829    base_params = self.get_parameters()
830    key = 'children' if 'children' in base_params else 'child'
831    children_refs = base_params.get(key, None) or []
832    if isinstance(children_refs, str) or isinstance(children_refs, dict):
833        children_refs = [children_refs]
834
835    if not children_refs:
836        return []
837
838    self._children = [get_pipe_from_value(val, _pipe=self) for val in children_refs]
839    return self._children

Return a list of meerschaum.Pipe objects to be designated as children.

child: Pipe | None
842@property
843def child(self) -> mrsm.Pipe | None:
844    """
845    Return the first pipe in `self.children` or None.
846    """
847    _children = self.children
848    if not _children:
849        return None
850
851    return _children[0]

Return the first pipe in self.children or None.

reference: Pipe | None
911@property
912def reference(self) -> mrsm.Pipe | None:
913    """
914    Return the first pipe in `self.references` or None.
915    """
916    _references = self.references
917    if not _references:
918        return None
919
920    return _references[0]

Return the first pipe in self.references or None.

references: List[Pipe]
888@property
889def references(self) -> List[mrsm.Pipe]:
890    """
891    Return a list of `meerschaum.Pipe` objects to be designated as references.
892    """
893    _cached_references = self.__dict__.get('_references', None)
894    if _cached_references is not None:
895        return _cached_references
896
897    from meerschaum.utils.pipes import get_pipe_from_value
898    base_params = self.get_parameters(apply_symlinks=False)
899    key = 'references' if 'references' in base_params else 'reference'
900    refs = base_params.get(key, None) or []
901    if isinstance(refs, str) or isinstance(refs, dict):
902        refs = [refs]
903
904    if not refs:
905        return []
906
907    self._refs = [get_pipe_from_value(val, _pipe=self) for val in refs]
908    return self._refs

Return a list of meerschaum.Pipe objects to be designated as references.

target: str
 958@property
 959def target(self) -> str:
 960    """
 961    The target table name.
 962    You can set the target name under on of the following keys
 963    (checked in this order):
 964      - `target`
 965      - `target_name`
 966      - `target_table`
 967      - `target_table_name`
 968    """
 969    target_val = self.parameters.get('target', None)
 970    if not target_val:
 971        default_target = self._target_legacy()
 972        default_targets = {default_target}
 973        potential_keys = ('target_name', 'target_table', 'target_table_name')
 974        _target = None
 975        for k in potential_keys:
 976            if k in self.parameters:
 977                _target = self.parameters[k]
 978                break
 979
 980        _target = _target or default_target
 981
 982        if self.instance_connector.type == 'sql':
 983            from meerschaum.utils.sql import truncate_item_name
 984            truncated_target = truncate_item_name(_target, self.instance_connector.flavor)
 985            default_targets.add(truncated_target)
 986            warned_target = self.__dict__.get('_warned_target', False)
 987            if truncated_target != _target and not warned_target:
 988                if self.instance_connector.flavor not in ('oracle', 'mysql', 'mariadb'):
 989                    warn(
 990                        f"The target '{_target}' is too long for '{self.instance_connector.flavor}', "
 991                        + f"will use {truncated_target} instead."
 992                    )
 993                self.__dict__['_warned_target'] = True
 994                _target = truncated_target
 995
 996        if _target in default_targets:
 997            return _target
 998
 999        self.target = _target
1000        return _target
1001
1002    return target_val

The target table name. You can set the target name under on of the following keys (checked in this order):

  • target
  • target_name
  • target_table
  • target_table_name
def guess_datetime(self) -> Optional[str]:
1025def guess_datetime(self) -> Union[str, None]:
1026    """
1027    Try to determine a pipe's datetime column.
1028    """
1029    _dtypes = self.dtypes
1030
1031    ### Abort if the user explictly disallows a datetime index.
1032    if 'datetime' in _dtypes:
1033        if _dtypes['datetime'] is None:
1034            return None
1035
1036    from meerschaum.utils.dtypes import are_dtypes_equal
1037    dt_cols = [
1038        col
1039        for col, typ in _dtypes.items()
1040        if are_dtypes_equal(typ, 'datetime')
1041    ]
1042    if not dt_cols:
1043        return None
1044    return dt_cols[0]

Try to determine a pipe's datetime column.

precision: Dict[str, Union[str, int]]
1189@property
1190def precision(self) -> Dict[str, Union[str, int]]:
1191    """
1192    Return the configured or detected precision.
1193    """
1194    return self.get_precision(debug=self.debug)

Return the configured or detected precision.

def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
1100def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
1101    """
1102    Return the timestamp precision unit and interval for the `datetime` axis.
1103    """
1104    from meerschaum.utils.dtypes import (
1105        MRSM_PRECISION_UNITS_SCALARS,
1106        MRSM_PRECISION_UNITS_ALIASES,
1107        MRSM_PD_DTYPES,
1108        are_dtypes_equal,
1109    )
1110    from meerschaum._internal.static import STATIC_CONFIG
1111
1112    _precision = self._get_cached_value('precision', debug=debug)
1113    if _precision:
1114        if debug:
1115            dprint(f"Returning cached precision: {_precision}")
1116        return _precision
1117
1118    parameters = self.parameters
1119    _precision = parameters.get('precision', {})
1120    if isinstance(_precision, str):
1121        _precision = {'unit': _precision}
1122    default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
1123
1124    if not _precision:
1125
1126        dt_col = parameters.get('columns', {}).get('datetime', None)
1127        if not dt_col and self.autotime:
1128            dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
1129        if not dt_col:
1130            if debug:
1131                dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.")
1132            return {'unit': default_precision_unit}
1133
1134        dt_typ = self.dtypes.get(dt_col, 'datetime')
1135        if are_dtypes_equal(dt_typ, 'datetime'):
1136            if dt_typ == 'datetime':
1137                dt_typ = MRSM_PD_DTYPES['datetime']
1138                if debug:
1139                    dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.")
1140
1141            _precision = {
1142                'unit': (
1143                    dt_typ
1144                    .split('[', maxsplit=1)[-1]
1145                    .split(',', maxsplit=1)[0]
1146                    .split(' ', maxsplit=1)[0]
1147                ).rstrip(']')
1148            }
1149
1150            if debug:
1151                dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.")
1152
1153        elif are_dtypes_equal(dt_typ, 'int'):
1154            _precision = {
1155                'unit': (
1156                    'second'
1157                    if '32' in dt_typ
1158                    else default_precision_unit
1159                )
1160            }
1161        elif are_dtypes_equal(dt_typ, 'date'):
1162            if debug:
1163                dprint("Datetime axis is 'date', falling back to 'day' precision.")
1164            _precision = {'unit': 'day'}
1165
1166    precision_unit = _precision.get('unit', default_precision_unit)
1167    precision_interval = _precision.get('interval', None)
1168    true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
1169    if true_precision_unit is None:
1170        if debug:
1171            dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.")
1172        true_precision_unit = default_precision_unit
1173
1174    if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS:
1175        from meerschaum.utils.misc import items_str
1176        raise ValueError(
1177            f"Invalid precision unit '{true_precision_unit}'.\n"
1178            "Accepted values are "
1179            f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}."
1180        )
1181
1182    _precision = {'unit': true_precision_unit}
1183    if precision_interval:
1184        _precision['interval'] = precision_interval
1185    self._cache_value('precision', _precision, debug=debug)
1186    return self._precision

Return the timestamp precision unit and interval for the datetime axis.

def show( self, nopretty: bool = False, debug: bool = False, **kw) -> Tuple[bool, str]:
12def show(
13    self,
14    nopretty: bool = False,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Show attributes of a Pipe.
20
21    Parameters
22    ----------
23    nopretty: bool, default False
24        If `True`, simply print the JSON of the pipe's attributes.
25
26    debug: bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success, message.
32
33    """
34    import json
35    from meerschaum.utils.formatting import (
36        pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console,
37    )
38    from meerschaum.utils.packages import import_rich, attempt_import
39    from meerschaum.utils.warnings import info
40    attributes_json = json.dumps(self.attributes)
41    if not nopretty:
42        _to_print = f"Attributes for {self}:"
43        if ANSI:
44            _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta')
45            print(_to_print)
46            rich = import_rich()
47            rich_json = attempt_import('rich.json')
48            get_console().print(rich_json.JSON(attributes_json))
49        else:
50            print(_to_print)
51    else:
52        print(attributes_json)
53
54    return True, "Success"

Show attributes of a Pipe.

Parameters
  • nopretty (bool, default False): If True, simply print the JSON of the pipe's attributes.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit( self, patch: bool = False, interactive: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 21def edit(
 22    self,
 23    patch: bool = False,
 24    interactive: bool = False,
 25    debug: bool = False,
 26    **kw: Any
 27) -> SuccessTuple:
 28    """
 29    Edit a Pipe's configuration.
 30
 31    Parameters
 32    ----------
 33    patch: bool, default False
 34        If `patch` is True, update parameters by cascading rather than overwriting.
 35    interactive: bool, default False
 36        If `True`, open an editor for the user to make changes to the pipe's YAML file.
 37    debug: bool, default False
 38        Verbosity toggle.
 39
 40    Returns
 41    -------
 42    A `SuccessTuple` of success, message.
 43
 44    """
 45    from meerschaum.utils.venv import Venv
 46    from meerschaum.connectors import get_connector_plugin
 47
 48    if self.temporary:
 49        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
 50
 51    self._invalidate_cache(hard=True, debug=debug)
 52
 53    if hasattr(self, '_symlinks'):
 54        from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path
 55        for path, vals in self._symlinks.items():
 56            current_val = get_val_from_dict_path(self.parameters, path)
 57            if current_val == vals['substituted']:
 58                set_val_in_dict_path(self.parameters, path, vals['original'])
 59
 60    if not interactive:
 61        with Venv(get_connector_plugin(self.instance_connector)):
 62            return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
 63
 64    import meerschaum.config.paths as paths
 65    from meerschaum.utils.misc import edit_file
 66    parameters_filename = str(self) + '.yaml'
 67    parameters_path = paths.PIPES_CACHE_RESOURCES_PATH / parameters_filename
 68
 69    from meerschaum.utils.yaml import yaml
 70
 71    edit_text = f"Edit the parameters for {self}"
 72    edit_top = '#' * (len(edit_text) + 4)
 73    edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n'
 74
 75    from meerschaum.config import get_config
 76    parameters = dict(get_config('pipes', 'parameters', patch=True))
 77    from meerschaum.config._patch import apply_patch_to_config
 78    raw_parameters = self.attributes.get('parameters', {})
 79    parameters = apply_patch_to_config(parameters, raw_parameters)
 80
 81    ### write parameters to yaml file
 82    with open(parameters_path, 'w+') as f:
 83        f.write(edit_header)
 84        yaml.dump(parameters, stream=f, sort_keys=False)
 85
 86    ### only quit editing if yaml is valid
 87    editing = True
 88    while editing:
 89        edit_file(parameters_path)
 90        try:
 91            with open(parameters_path, 'r') as f:
 92                file_parameters = yaml.load(f.read())
 93        except Exception as e:
 94            from meerschaum.utils.warnings import warn
 95            warn(f"Invalid format defined for '{self}':\n\n{e}")
 96            input(f"Press [Enter] to correct the configuration for '{self}': ")
 97        else:
 98            editing = False
 99
100    self.parameters = file_parameters
101
102    if debug:
103        from meerschaum.utils.formatting import pprint
104        pprint(self.parameters)
105
106    with Venv(get_connector_plugin(self.instance_connector)):
107        return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)

Edit a Pipe's configuration.

Parameters
  • patch (bool, default False): If patch is True, update parameters by cascading rather than overwriting.
  • interactive (bool, default False): If True, open an editor for the user to make changes to the pipe's YAML file.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit_definition( self, yes: bool = False, noask: bool = False, force: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
110def edit_definition(
111    self,
112    yes: bool = False,
113    noask: bool = False,
114    force: bool = False,
115    debug : bool = False,
116    **kw : Any
117) -> SuccessTuple:
118    """
119    Edit a pipe's definition file and update its configuration.
120    **NOTE:** This function is interactive and should not be used in automated scripts!
121
122    Returns
123    -------
124    A `SuccessTuple` of success, message.
125
126    """
127    if self.temporary:
128        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
129
130    from meerschaum.connectors import instance_types
131    if (self.connector is None or isinstance(self.connector, str)) or self.connector.type not in instance_types:
132        return self.edit(interactive=True, debug=debug, **kw)
133
134    import json
135    from meerschaum.utils.warnings import info, warn
136    from meerschaum.utils.debug import dprint
137    from meerschaum.config._patch import apply_patch_to_config
138    from meerschaum.utils.misc import edit_file
139
140    _parameters = self.parameters
141    if 'fetch' not in _parameters:
142        _parameters['fetch'] = {}
143
144    def _edit_api():
145        from meerschaum.utils.prompt import prompt, yes_no
146        info(
147            f"Please enter the keys of the source pipe from '{self.connector}'.\n" +
148            "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip."
149        )
150
151        _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None }
152        for k in _keys:
153            _keys[k] = _parameters['fetch'].get(k, None)
154
155        for k, v in _keys.items():
156            try:
157                _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v)
158            except KeyboardInterrupt:
159                continue
160            if _keys[k] in ('', 'None', '\'None\'', '[None]'):
161                _keys[k] = None
162
163        _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys)
164
165        info("You may optionally specify additional filter parameters as JSON.")
166        print("  Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.")
167        print("  For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':")
168        print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': ')))
169        if force or yes_no(
170            "Would you like to add additional filter parameters?",
171            yes=yes, noask=noask
172        ):
173            import meerschaum.config.paths as paths
174            definition_filename = str(self) + '.json'
175            definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename
176            try:
177                definition_path.touch()
178                with open(definition_path, 'w+') as f:
179                    json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2)
180            except Exception as e:
181                return False, f"Failed writing file '{definition_path}':\n" + str(e)
182
183            _params = None
184            while True:
185                edit_file(definition_path)
186                try:
187                    with open(definition_path, 'r') as f:
188                        _params = json.load(f)
189                except Exception as e:
190                    warn(f'Failed to read parameters JSON:\n{e}', stack=False)
191                    if force or yes_no(
192                        "Would you like to try again?\n  "
193                        + "If not, the parameters JSON file will be ignored.",
194                        noask=noask, yes=yes
195                    ):
196                        continue
197                    _params = None
198                break
199            if _params is not None:
200                if 'fetch' not in _parameters:
201                    _parameters['fetch'] = {}
202                _parameters['fetch']['params'] = _params
203
204        self.parameters = _parameters
205        return True, "Success"
206
207    def _edit_sql():
208        import textwrap
209        import meerschaum.config.paths as paths
210        from meerschaum.utils.misc import edit_file
211        definition_filename = str(self) + '.sql'
212        definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename
213
214        sql_definition = _parameters['fetch'].get('definition', None)
215        if sql_definition is None:
216            sql_definition = ''
217        sql_definition = textwrap.dedent(sql_definition).lstrip()
218
219        try:
220            definition_path.touch()
221            with open(definition_path, 'w+') as f:
222                f.write(sql_definition)
223        except Exception as e:
224            return False, f"Failed writing file '{definition_path}':\n" + str(e)
225
226        edit_file(definition_path)
227        try:
228            with open(definition_path, 'r', encoding='utf-8') as f:
229                file_definition = f.read()
230        except Exception as e:
231            return False, f"Failed reading file '{definition_path}':\n" + str(e)
232
233        if sql_definition == file_definition:
234            return False, f"No changes made to definition for {self}."
235
236        if ' ' not in file_definition:
237            return False, f"Invalid SQL definition for {self}."
238
239        if debug:
240            dprint("Read SQL definition:\n\n" + file_definition)
241        _parameters['fetch']['definition'] = file_definition
242        self.parameters = _parameters
243        return True, "Success"
244
245    locals()['_edit_' + str(self.connector.type)]()
246    return self.edit(interactive=False, debug=debug, **kw)

Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!

Returns
def update(self, *args, **kw) -> Tuple[bool, str]:
13def update(self, *args, **kw) -> SuccessTuple:
14    """
15    Update a pipe's parameters in its instance.
16    """
17    kw['interactive'] = False
18    return self.edit(*args, **kw)

Update a pipe's parameters in its instance.

def sync( self, df: Union[pandas.DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], str, meerschaum.core.Pipe._sync.InferFetch] = <class 'meerschaum.core.Pipe._sync.InferFetch'>, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, force: bool = False, retries: int = 10, min_seconds: int = 1, check_existing: bool = True, enforce_dtypes: bool = True, blocking: bool = True, workers: Optional[int] = None, callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, error_callback: Optional[Callable[[Exception], Any]] = None, chunksize: Optional[int] = -1, sync_chunks: bool = True, debug: bool = False, _inplace: bool = True, **kw: Any) -> Tuple[bool, str]:
 41def sync(
 42    self,
 43    df: Union[
 44        pd.DataFrame,
 45        Dict[str, List[Any]],
 46        List[Dict[str, Any]],
 47        str,
 48        InferFetch
 49    ] = InferFetch,
 50    begin: Union[datetime, int, str, None] = '',
 51    end: Union[datetime, int, None] = None,
 52    force: bool = False,
 53    retries: int = 10,
 54    min_seconds: int = 1,
 55    check_existing: bool = True,
 56    enforce_dtypes: bool = True,
 57    blocking: bool = True,
 58    workers: Optional[int] = None,
 59    callback: Optional[Callable[[Tuple[bool, str]], Any]] = None,
 60    error_callback: Optional[Callable[[Exception], Any]] = None,
 61    chunksize: Optional[int] = -1,
 62    sync_chunks: bool = True,
 63    debug: bool = False,
 64    _inplace: bool = True,
 65    **kw: Any
 66) -> SuccessTuple:
 67    """
 68    Fetch new data from the source and update the pipe's table with new data.
 69    
 70    Get new remote data via fetch, get existing data in the same time period,
 71    and merge the two, only keeping the unseen data.
 72
 73    Parameters
 74    ----------
 75    df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None
 76        An optional DataFrame to sync into the pipe. Defaults to `None`.
 77        If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`.
 78
 79    begin: Union[datetime, int, str, None], default ''
 80        Optionally specify the earliest datetime to search for data.
 81
 82    end: Union[datetime, int, str, None], default None
 83        Optionally specify the latest datetime to search for data.
 84
 85    force: bool, default False
 86        If `True`, keep trying to sync untul `retries` attempts.
 87
 88    retries: int, default 10
 89        If `force`, how many attempts to try syncing before declaring failure.
 90
 91    min_seconds: Union[int, float], default 1
 92        If `force`, how many seconds to sleep between retries. Defaults to `1`.
 93
 94    check_existing: bool, default True
 95        If `True`, pull and diff with existing data from the pipe.
 96
 97    enforce_dtypes: bool, default True
 98        If `True`, enforce dtypes on incoming data.
 99        Set this to `False` if the incoming rows are expected to be of the correct dtypes.
100
101    blocking: bool, default True
102        If `True`, wait for sync to finish and return its result, otherwise
103        asyncronously sync (oxymoron?) and return success. Defaults to `True`.
104        Only intended for specific scenarios.
105
106    workers: Optional[int], default None
107        If provided and the instance connector is thread-safe
108        (`pipe.instance_connector.IS_THREAD_SAFE is True`),
109        limit concurrent sync to this many threads.
110
111    callback: Optional[Callable[[Tuple[bool, str]], Any]], default None
112        Callback function which expects a SuccessTuple as input.
113        Only applies when `blocking=False`.
114
115    error_callback: Optional[Callable[[Exception], Any]], default None
116        Callback function which expects an Exception as input.
117        Only applies when `blocking=False`.
118
119    chunksize: int, default -1
120        Specify the number of rows to sync per chunk.
121        If `-1`, resort to system configuration (default is `900`).
122        A `chunksize` of `None` will sync all rows in one transaction.
123
124    sync_chunks: bool, default True
125        If possible, sync chunks while fetching them into memory.
126
127    debug: bool, default False
128        Verbosity toggle. Defaults to False.
129
130    Returns
131    -------
132    A `SuccessTuple` of success (`bool`) and message (`str`).
133    """
134    from meerschaum.utils.debug import dprint, _checkpoint
135    from meerschaum.utils.formatting import get_console
136    from meerschaum.utils.venv import Venv
137    from meerschaum.connectors import get_connector_plugin
138    from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments
139    from meerschaum.utils.pool import get_pool
140    from meerschaum.config import get_config
141    from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp
142
143    if (callback is not None or error_callback is not None) and blocking:
144        warn("Callback functions are only executed when blocking = False. Ignoring...")
145
146    _checkpoint(_total=2, **kw)
147
148    if chunksize == 0:
149        chunksize = None
150        sync_chunks = False
151
152    begin, end = self.parse_date_bounds(begin, end)
153    kw.update({
154        'begin': begin,
155        'end': end,
156        'force': force,
157        'retries': retries,
158        'min_seconds': min_seconds,
159        'check_existing': check_existing,
160        'blocking': blocking,
161        'workers': workers,
162        'callback': callback,
163        'error_callback': error_callback,
164        'sync_chunks': sync_chunks,
165        'chunksize': chunksize,
166        'safe_copy': True,
167    })
168
169    self._invalidate_cache(debug=debug)
170    self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug)
171
172    def _sync(
173        p: mrsm.Pipe,
174        df: Union[
175            'pd.DataFrame',
176            Dict[str, List[Any]],
177            List[Dict[str, Any]],
178            str,
179            InferFetch
180        ] = InferFetch,
181    ) -> SuccessTuple:
182        if df is None:
183            p._invalidate_cache(debug=debug)
184            return (
185                False,
186                f"You passed `None` instead of data into `sync()` for {p}.\n"
187                + "Omit the DataFrame to infer fetching.",
188            )
189        ### Ensure that Pipe is registered.
190        if not p.temporary and p.id is None:
191            ### NOTE: This may trigger an interactive session for plugins!
192            register_success, register_msg = p.register(debug=debug)
193            if not register_success:
194                if 'already' not in register_msg:
195                    p._invalidate_cache(debug=debug)
196                    return register_success, register_msg
197
198        if isinstance(df, str):
199            from meerschaum.utils.dataframe import parse_simple_lines
200            df = parse_simple_lines(df)
201
202        ### If connector is a plugin with a `sync()` method, return that instead.
203        ### If the plugin does not have a `sync()` method but does have a `fetch()` method,
204        ### use that instead.
205        ### NOTE: The DataFrame must be omitted for the plugin sync method to apply.
206        ### If a DataFrame is provided, continue as expected.
207        if hasattr(df, 'MRSM_INFER_FETCH'):
208            try:
209                if isinstance(p.connector, str):
210                    if ':' not in p.connector_keys:
211                        return True, f"{p} does not support fetching; nothing to do."
212
213                    msg = f"{p} does not have a valid connector."
214                    if p.connector_keys.startswith('plugin:'):
215                        msg += f"\n    Perhaps {p.connector_keys} has a syntax error?"
216                    p._invalidate_cache(debug=debug)
217                    return False, msg
218            except Exception:
219                p._invalidate_cache(debug=debug)
220                return False, f"Unable to create the connector for {p}."
221
222            ### Sync in place if possible.
223            if (
224                str(self.connector) == str(self.instance_connector)
225                and 
226                hasattr(self.instance_connector, 'sync_pipe_inplace')
227                and
228                _inplace
229                and
230                get_config('system', 'experimental', 'inplace_sync')
231            ):
232                with Venv(get_connector_plugin(self.instance_connector)):
233                    p._invalidate_cache(debug=debug)
234                    _args, _kwargs = filter_arguments(
235                        p.instance_connector.sync_pipe_inplace,
236                        p,
237                        debug=debug,
238                        **kw
239                    )
240                    return self.instance_connector.sync_pipe_inplace(
241                        *_args,
242                        **_kwargs
243                    )
244
245            ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods.
246            try:
247                if getattr(p.connector, 'sync', None) is not None:
248                    with Venv(get_connector_plugin(p.connector), debug=debug):
249                        _args, _kwargs = filter_arguments(
250                            p.connector.sync,
251                            p,
252                            debug=debug,
253                            **kw
254                        )
255                        return_tuple = p.connector.sync(*_args, **_kwargs)
256                    p._invalidate_cache(debug=debug)
257                    if not isinstance(return_tuple, tuple):
258                        return_tuple = (
259                            False,
260                            f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}"
261                        )
262                    return return_tuple
263
264            except Exception as e:
265                get_console().print_exception()
266                msg = f"Failed to sync {p} with exception: '" + str(e) + "'"
267                if debug:
268                    error(msg, silent=False)
269                p._invalidate_cache(debug=debug)
270                return False, msg
271
272            ### Fetch the dataframe from the connector's `fetch()` method.
273            try:
274                with Venv(get_connector_plugin(p.connector), debug=debug):
275                    df = p.fetch(
276                        **filter_keywords(
277                            p.fetch,
278                            debug=debug,
279                            **kw
280                        )
281                    )
282                    kw['safe_copy'] = False
283            except Exception as e:
284                get_console().print_exception(
285                    suppress=[
286                        'meerschaum/core/Pipe/_sync.py',
287                        'meerschaum/core/Pipe/_fetch.py',
288                    ]
289                )
290                msg = f"Failed to fetch data from {p.connector}:\n    {e}"
291                df = None
292
293            if df is None:
294                p._invalidate_cache(debug=debug)
295                return False, f"No data were fetched for {p}."
296
297            if isinstance(df, list):
298                if len(df) == 0:
299                    return True, f"No new rows were returned for {p}."
300
301                ### May be a chunk hook results list.
302                if isinstance(df[0], tuple):
303                    success = all([_success for _success, _ in df])
304                    message = '\n'.join([_message for _, _message in df])
305                    return success, message
306
307            if df is True:
308                p._invalidate_cache(debug=debug)
309                return True, f"{p} is being synced in parallel."
310
311        ### CHECKPOINT: Retrieved the DataFrame.
312        _checkpoint(**kw)
313
314        ### Allow for dataframe generators or iterables.
315        if df_is_chunk_generator(df):
316            kw['workers'] = p.get_num_workers(kw.get('workers', None))
317            dt_col = p.columns.get('datetime', None)
318            pool = get_pool(workers=kw.get('workers', 1))
319            if debug:
320                dprint(f"Received {type(df)}. Attempting to sync first chunk...")
321
322            try:
323                chunk = next(df)
324            except StopIteration:
325                return True, "Received an empty generator; nothing to do."
326
327            chunk_success, chunk_msg = _sync(p, chunk)
328            chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg
329            if not chunk_success:
330                return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}"
331            if debug:
332                dprint("Successfully synced the first chunk, attemping the rest...")
333
334            def _process_chunk(_chunk):
335                _chunk_attempts = 0
336                _max_chunk_attempts = 3
337                while _chunk_attempts < _max_chunk_attempts:
338                    try:
339                        _chunk_success, _chunk_msg = _sync(p, _chunk)
340                    except Exception as e:
341                        _chunk_success, _chunk_msg = False, str(e)
342                    if _chunk_success:
343                        break
344                    _chunk_attempts += 1
345                    _sleep_seconds = _chunk_attempts ** 2
346                    warn(
347                        (
348                            f"Failed to sync chunk to {self} "
349                            + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n"
350                            + f"Sleeping for {_sleep_seconds} second"
351                            + ('s' if _sleep_seconds != 1 else '')
352                            + f":\n{_chunk_msg}"
353                        ),
354                        stack=False,
355                    )
356                    time.sleep(_sleep_seconds)
357
358                num_rows_str = (
359                    f"{num_rows:,} rows"
360                    if (num_rows := len(_chunk)) != 1
361                    else f"{num_rows} row"
362                )
363                _chunk_msg = (
364                    (
365                        "Synced"
366                        if _chunk_success
367                        else "Failed to sync"
368                    ) + f" a chunk ({num_rows_str}) to {p}:\n"
369                    + self._get_chunk_label(_chunk, dt_col)
370                    + '\n'
371                    + _chunk_msg
372                )
373
374                mrsm.pprint((_chunk_success, _chunk_msg), calm=True)
375                return _chunk_success, _chunk_msg
376
377            results = sorted(
378                [(chunk_success, chunk_msg)] + (
379                    list(pool.imap(_process_chunk, df))
380                    if (
381                        not df_is_chunk_generator(chunk)  # Handle nested generators.
382                        and kw.get('workers', 1) != 1
383                    )
384                    else list(
385                        _process_chunk(_child_chunks)
386                        for _child_chunks in df
387                    )
388                )
389            )
390            chunk_messages = [chunk_msg for _, chunk_msg in results]
391            success_bools = [chunk_success for chunk_success, _ in results]
392            num_successes = len([chunk_success for chunk_success, _ in results if chunk_success])
393            num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success])
394            success = all(success_bools)
395            msg = (
396                'Synced '
397                + f'{len(chunk_messages):,} chunk'
398                + ('s' if len(chunk_messages) != 1 else '')
399                + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n'
400                + '\n\n'.join(chunk_messages).lstrip().rstrip()
401            ).lstrip().rstrip()
402            return success, msg
403
404        ### Cast to a dataframe and ensure datatypes are what we expect.
405        dtypes = p.get_dtypes(debug=debug)
406        df = p.enforce_dtypes(
407            df,
408            chunksize=chunksize,
409            enforce=enforce_dtypes,
410            dtypes=dtypes,
411            debug=debug,
412        )
413        if p.autotime:
414            dt_col = p.columns.get('datetime', None)
415            ts_col = dt_col or mrsm.get_config(
416                'pipes', 'autotime', 'column_name_if_datetime_missing'
417            )
418            ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime'
419            if ts_col and hasattr(df, 'columns') and ts_col not in df.columns:
420                precision = p.get_precision(debug=debug)
421                now = get_current_timestamp(
422                    precision_unit=precision.get(
423                        'unit',
424                        STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
425                    ),
426                    precision_interval=precision.get('interval', 1),
427                    round_to=(precision.get('round_to', 'down')),
428                    as_int=(are_dtypes_equal(ts_typ, 'int')),
429                )
430                if debug:
431                    dprint(f"Adding current timestamp to dataframe synced to {p}: {now}")
432
433                df[ts_col] = now
434                kw['check_existing'] = dt_col is not None
435
436        ### Capture special columns.
437        capture_success, capture_msg = self._persist_new_special_columns(
438            df,
439            dtypes=dtypes,
440            debug=debug,
441        )
442        if not capture_success:
443            warn(f"Failed to capture new special columns for {self}:\n{capture_msg}")
444
445        if debug:
446            dprint(
447                "DataFrame to sync:\n"
448                + (
449                    str(df)[:255]
450                    + '...'
451                    if len(str(df)) >= 256
452                    else str(df)
453                ),
454                **kw
455            )
456
457        ### if force, continue to sync until success
458        return_tuple = False, f"Did not sync {p}."
459        run = True
460        _retries = 1
461        while run:
462            with Venv(get_connector_plugin(self.instance_connector)):
463                return_tuple = p.instance_connector.sync_pipe(
464                    pipe=p,
465                    df=df,
466                    debug=debug,
467                    **kw
468                )
469            _retries += 1
470            run = (not return_tuple[0]) and force and _retries <= retries
471            if run and debug:
472                dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw)
473                dprint(f"Sleeping for {min_seconds} seconds...", **kw)
474                time.sleep(min_seconds)
475            if _retries > retries:
476                warn(
477                    f"Unable to sync {p} within {retries} attempt" +
478                        ("s" if retries != 1 else "") + "!"
479                )
480
481        ### CHECKPOINT: Finished syncing.
482        _checkpoint(**kw)
483        p._invalidate_cache(debug=debug)
484        return return_tuple
485
486    if blocking:
487        return _sync(self, df=df)
488
489    from meerschaum.utils.threading import Thread
490    def default_callback(result_tuple: SuccessTuple):
491        dprint(f"Asynchronous result from {self}: {result_tuple}", **kw)
492
493    def default_error_callback(x: Exception):
494        dprint(f"Error received for {self}: {x}", **kw)
495
496    if callback is None and debug:
497        callback = default_callback
498    if error_callback is None and debug:
499        error_callback = default_error_callback
500    try:
501        thread = Thread(
502            target=_sync,
503            args=(self,),
504            kwargs={'df': df},
505            daemon=False,
506            callback=callback,
507            error_callback=error_callback,
508        )
509        thread.start()
510    except Exception as e:
511        self._invalidate_cache(debug=debug)
512        return False, str(e)
513
514    self._invalidate_cache(debug=debug)
515    return True, f"Spawned asyncronous sync for {self}."

Fetch new data from the source and update the pipe's table with new data.

Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.

Parameters
  • df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None): An optional DataFrame to sync into the pipe. Defaults to None. If df is a string, it will be parsed via meerschaum.utils.dataframe.parse_simple_lines().
  • begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
  • end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
  • force (bool, default False): If True, keep trying to sync untul retries attempts.
  • retries (int, default 10): If force, how many attempts to try syncing before declaring failure.
  • min_seconds (Union[int, float], default 1): If force, how many seconds to sleep between retries. Defaults to 1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe.
  • enforce_dtypes (bool, default True): If True, enforce dtypes on incoming data. Set this to False if the incoming rows are expected to be of the correct dtypes.
  • blocking (bool, default True): If True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults to True. Only intended for specific scenarios.
  • workers (Optional[int], default None): If provided and the instance connector is thread-safe (pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads.
  • callback (Optional[Callable[[Tuple[bool, str]], Any]], default None): Callback function which expects a SuccessTuple as input. Only applies when blocking=False.
  • error_callback (Optional[Callable[[Exception], Any]], default None): Callback function which expects an Exception as input. Only applies when blocking=False.
  • chunksize (int, default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction.
  • sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def get_sync_time( self, params: Optional[Dict[str, Any]] = None, newest: bool = True, apply_backtrack_interval: bool = False, remote: bool = False, round_down: bool = False, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
518def get_sync_time(
519    self,
520    params: Optional[Dict[str, Any]] = None,
521    newest: bool = True,
522    apply_backtrack_interval: bool = False,
523    remote: bool = False,
524    round_down: bool = False,
525    debug: bool = False
526) -> Union['datetime', int, None]:
527    """
528    Get the most recent datetime value for a Pipe.
529
530    Parameters
531    ----------
532    params: Optional[Dict[str, Any]], default None
533        Dictionary to build a WHERE clause for a specific column.
534        See `meerschaum.utils.sql.build_where`.
535
536    newest: bool, default True
537        If `True`, get the most recent datetime (honoring `params`).
538        If `False`, get the oldest datetime (`ASC` instead of `DESC`).
539
540    apply_backtrack_interval: bool, default False
541        If `True`, subtract the backtrack interval from the sync time.
542
543    remote: bool, default False
544        If `True` and the instance connector supports it, return the sync time
545        for the remote table definition.
546
547    round_down: bool, default False
548        If `True`, round down the datetime value to the nearest minute.
549
550    debug: bool, default False
551        Verbosity toggle.
552
553    Returns
554    -------
555    A `datetime` or int, if the pipe exists, otherwise `None`.
556
557    """
558    from meerschaum.utils.venv import Venv
559    from meerschaum.connectors import get_connector_plugin
560    from meerschaum.utils.misc import filter_keywords
561    from meerschaum.utils.dtypes import round_time
562    from meerschaum.utils.warnings import warn
563
564    if not self.columns.get('datetime', None):
565        return None
566
567    connector = self.instance_connector if not remote else self.connector
568    if isinstance(connector, str) or connector is None:
569        return None
570
571    with Venv(get_connector_plugin(connector)):
572        if not hasattr(connector, 'get_sync_time'):
573            warn(
574                f"Connectors of type '{connector.type}' "
575                "do not implement `get_sync_time().",
576                stack=False,
577            )
578            return None
579        sync_time = connector.get_sync_time(
580            self,
581            **filter_keywords(
582                connector.get_sync_time,
583                params=params,
584                newest=newest,
585                remote=remote,
586                debug=debug,
587            )
588        )
589
590    if round_down and isinstance(sync_time, datetime):
591        sync_time = round_time(sync_time, timedelta(minutes=1))
592
593    if apply_backtrack_interval and sync_time is not None:
594        backtrack_interval = self.get_backtrack_interval(debug=debug)
595        try:
596            sync_time -= backtrack_interval
597        except Exception as e:
598            warn(f"Failed to apply backtrack interval:\n{e}")
599
600    return self.parse_date_bounds(sync_time)

Get the most recent datetime value for a Pipe.

Parameters
  • params (Optional[Dict[str, Any]], default None): Dictionary to build a WHERE clause for a specific column. See meerschaum.utils.sql.build_where.
  • newest (bool, default True): If True, get the most recent datetime (honoring params). If False, get the oldest datetime (ASC instead of DESC).
  • apply_backtrack_interval (bool, default False): If True, subtract the backtrack interval from the sync time.
  • remote (bool, default False): If True and the instance connector supports it, return the sync time for the remote table definition.
  • round_down (bool, default False): If True, round down the datetime value to the nearest minute.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A datetime or int, if the pipe exists, otherwise None.
def exists(self, debug: bool = False) -> bool:
603def exists(
604    self,
605    debug: bool = False
606) -> bool:
607    """
608    See if a Pipe's table exists.
609
610    Parameters
611    ----------
612    debug: bool, default False
613        Verbosity toggle.
614
615    Returns
616    -------
617    A `bool` corresponding to whether a pipe's underlying table exists.
618
619    """
620    from meerschaum.utils.venv import Venv
621    from meerschaum.connectors import get_connector_plugin
622    from meerschaum.utils.debug import dprint
623    from meerschaum.utils.dtypes import get_current_timestamp
624    now = get_current_timestamp('ms', as_int=True) / 1000
625    cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds')
626
627    _exists = self._get_cached_value('_exists', debug=debug)
628    if _exists:
629        exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug)
630        if exists_timestamp is not None:
631            delta = now - exists_timestamp
632            if delta < cache_seconds:
633                if debug:
634                    dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).")
635                return _exists
636
637    with Venv(get_connector_plugin(self.instance_connector)):
638        _exists = (
639            self.instance_connector.pipe_exists(pipe=self, debug=debug)
640            if hasattr(self.instance_connector, 'pipe_exists')
641            else False
642        )
643
644    self._cache_value('_exists', _exists, debug=debug)
645    self._cache_value('_exists_timestamp', now, debug=debug)
646    return _exists

See if a Pipe's table exists.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool corresponding to whether a pipe's underlying table exists.
def filter_existing( self, df: pandas.DataFrame, safe_copy: bool = True, date_bound_only: bool = False, include_unchanged_columns: bool = False, enforce_dtypes: bool = False, chunksize: Optional[int] = -1, debug: bool = False, **kw) -> Tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]:
649def filter_existing(
650    self,
651    df: 'pd.DataFrame',
652    safe_copy: bool = True,
653    date_bound_only: bool = False,
654    include_unchanged_columns: bool = False,
655    enforce_dtypes: bool = False,
656    chunksize: Optional[int] = -1,
657    debug: bool = False,
658    **kw
659) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']:
660    """
661    Inspect a dataframe and filter out rows which already exist in the pipe.
662
663    Parameters
664    ----------
665    df: 'pd.DataFrame'
666        The dataframe to inspect and filter.
667
668    safe_copy: bool, default True
669        If `True`, create a copy before comparing and modifying the dataframes.
670        Setting to `False` may mutate the DataFrames.
671        See `meerschaum.utils.dataframe.filter_unseen_df`.
672
673    date_bound_only: bool, default False
674        If `True`, only use the datetime index to fetch the sample dataframe.
675
676    include_unchanged_columns: bool, default False
677        If `True`, include the backtrack columns which haven't changed in the update dataframe.
678        This is useful if you can't update individual keys.
679
680    enforce_dtypes: bool, default False
681        If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes.
682        Setting `enforce_dtypes=True` may impact performance.
683
684    chunksize: Optional[int], default -1
685        The `chunksize` used when fetching existing data.
686
687    debug: bool, default False
688        Verbosity toggle.
689
690    Returns
691    -------
692    A tuple of three pandas DataFrames: unseen, update, and delta.
693    """
694    from meerschaum.utils.warnings import warn
695    from meerschaum.utils.debug import dprint
696    from meerschaum.utils.packages import attempt_import, import_pandas
697    from meerschaum.utils.dataframe import (
698        filter_unseen_df,
699        add_missing_cols_to_df,
700        get_unhashable_cols,
701    )
702    from meerschaum.utils.dtypes import (
703        to_pandas_dtype,
704        none_if_null,
705        to_datetime,
706        are_dtypes_equal,
707        value_is_null,
708        round_time,
709    )
710    from meerschaum.config import get_config
711    pd = import_pandas()
712    pandas = attempt_import('pandas')
713    if enforce_dtypes or 'dataframe' not in str(type(df)).lower():
714        df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
715    is_dask = hasattr('df', '__module__') and 'dask' in df.__module__
716    if is_dask:
717        dd = attempt_import('dask.dataframe')
718        merge = dd.merge
719        NA = pandas.NA
720    else:
721        merge = pd.merge
722        NA = pd.NA
723
724    parameters = self.parameters
725    pipe_columns = self.columns
726    primary_key = pipe_columns.get('primary', None)
727    dt_col = pipe_columns.get('datetime', None)
728    dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None
729    autoincrement = parameters.get('autoincrement', False)
730    autotime = parameters.get('autotime', False)
731
732    if primary_key and autoincrement and df is not None and primary_key in df.columns:
733        if safe_copy:
734            df = df.copy()
735            safe_copy = False
736        if df[primary_key].isnull().all():
737            del df[primary_key]
738            _ = self.columns.pop(primary_key, None)
739
740    if dt_col and autotime and df is not None and dt_col in df.columns:
741        if safe_copy:
742            df = df.copy()
743            safe_copy = False
744        if df[dt_col].isnull().all():
745            del df[dt_col]
746            _ = self.columns.pop(dt_col, None)
747
748    def get_empty_df():
749        empty_df = pd.DataFrame([])
750        dtypes = dict(df.dtypes) if df is not None else {}
751        dtypes.update(self.dtypes) if self.enforce else {}
752        pd_dtypes = {
753            col: to_pandas_dtype(str(typ))
754            for col, typ in dtypes.items()
755        }
756        return add_missing_cols_to_df(empty_df, pd_dtypes)
757
758    if df is None:
759        empty_df = get_empty_df()
760        return empty_df, empty_df, empty_df
761
762    if (df.empty if not is_dask else len(df) == 0):
763        return df, df, df
764
765    ### begin is the oldest data in the new dataframe
766    begin, end = None, None
767
768    if autoincrement and primary_key == dt_col and dt_col not in df.columns:
769        if enforce_dtypes:
770            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
771        return df, get_empty_df(), df
772
773    if autotime and dt_col and dt_col not in df.columns:
774        if enforce_dtypes:
775            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
776        return df, get_empty_df(), df
777
778    try:
779        min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None
780        if is_dask and min_dt_val is not None:
781            min_dt_val = min_dt_val.compute()
782        min_dt = (
783            to_datetime(min_dt_val, as_pydatetime=True)
784            if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime')
785            else min_dt_val
786        )
787    except Exception:
788        min_dt = None
789
790    if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt):
791       if not are_dtypes_equal('int', str(type(min_dt))):
792            min_dt = None
793
794    if isinstance(min_dt, datetime):
795        rounded_min_dt = round_time(min_dt, to='down')
796        try:
797            begin = rounded_min_dt - timedelta(minutes=1)
798        except OverflowError:
799            begin = rounded_min_dt
800    elif dt_type and 'int' in dt_type.lower():
801        begin = min_dt
802    elif dt_col is None:
803        begin = None
804
805    ### end is the newest data in the new dataframe
806    try:
807        max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None
808        if is_dask and max_dt_val is not None:
809            max_dt_val = max_dt_val.compute()
810        max_dt = (
811            to_datetime(max_dt_val, as_pydatetime=True)
812            if max_dt_val is not None and 'datetime' in str(dt_type)
813            else max_dt_val
814        )
815    except Exception:
816        import traceback
817        traceback.print_exc()
818        max_dt = None
819
820    if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt):
821        if not are_dtypes_equal('int', str(type(max_dt))):
822            max_dt = None
823
824    if isinstance(max_dt, datetime):
825        end = (
826            round_time(
827                max_dt,
828                to='down'
829            ) + timedelta(minutes=1)
830        )
831    elif dt_type and 'int' in dt_type.lower() and max_dt is not None:
832        end = max_dt + 1
833
834    if max_dt is not None and min_dt is not None and min_dt > max_dt:
835        warn("Detected minimum datetime greater than maximum datetime.")
836
837    if begin is not None and end is not None and begin > end:
838        if isinstance(begin, datetime):
839            begin = end - timedelta(minutes=1)
840        ### We might be using integers for the datetime axis.
841        else:
842            begin = end - 1
843
844    unique_index_vals = {
845        col: df[col].unique()
846        for col in (pipe_columns.values() if not primary_key else [primary_key])
847        if col in df.columns and col != dt_col
848    } if not date_bound_only else {}
849    unique_index_lens = {
850        col: len(unique_vals)
851        for col, unique_vals in unique_index_vals.items()
852    } if not date_bound_only else {}
853    filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit')
854    _ = kw.pop('params', None)
855    params = {
856        col: [
857            none_if_null(val)
858            for val in unique_vals
859        ]
860        for col, unique_vals in unique_index_vals.items()
861        if unique_index_lens[col] <= filter_params_index_limit
862    } if not date_bound_only else {}
863
864    if debug:
865        dprint(
866            (
867                f"Looking at data between '{begin}' and '{end}' with index value lengths:\n"
868                f"{json.dumps(unique_index_lens, indent=4)}\n"
869            ),
870            **kw
871        )
872
873    backtrack_df = self.get_data(
874        begin=begin,
875        end=end,
876        chunksize=chunksize,
877        params=params,
878        debug=debug,
879        **kw
880    )
881    if backtrack_df is None:
882        if debug:
883            dprint(f"No backtrack data was found for {self}.")
884        return df, get_empty_df(), df
885
886    if enforce_dtypes:
887        backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug)
888
889    if debug:
890        dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw)
891        dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes))
892
893    ### Separate new rows from changed ones.
894    on_cols = [
895        col
896        for col_key, col in pipe_columns.items()
897        if (
898            col
899            and
900            col_key != 'value'
901            and col in backtrack_df.columns
902        )
903    ] if not primary_key else [primary_key]
904
905    self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {}
906    on_cols_dtypes = {
907        col: to_pandas_dtype(typ)
908        for col, typ in self_dtypes.items()
909        if col in on_cols
910    }
911
912    ### Detect changes between the old target and new source dataframes.
913    delta_df = add_missing_cols_to_df(
914        filter_unseen_df(
915            backtrack_df,
916            df,
917            dtypes={
918                col: to_pandas_dtype(typ)
919                for col, typ in self_dtypes.items()
920            },
921            safe_copy=safe_copy,
922            coerce_mixed_numerics=(not self.static),
923            debug=debug
924        ),
925        on_cols_dtypes,
926    )
927    if enforce_dtypes:
928        delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug)
929
930    ### Cast dicts or lists to strings so we can merge.
931    serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str)
932
933    def deserializer(x):
934        return json.loads(x) if isinstance(x, str) else x
935
936    unhashable_delta_cols = get_unhashable_cols(delta_df)
937    unhashable_backtrack_cols = get_unhashable_cols(backtrack_df)
938    for col in unhashable_delta_cols:
939        delta_df[col] = delta_df[col].apply(serializer)
940    for col in unhashable_backtrack_cols:
941        backtrack_df[col] = backtrack_df[col].apply(serializer)
942    casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols)
943
944    joined_df = merge(
945        delta_df.infer_objects().fillna(NA),
946        backtrack_df.infer_objects().fillna(NA),
947        how='left',
948        on=on_cols,
949        indicator=True,
950        suffixes=('', '_old'),
951    ) if on_cols else delta_df
952    for col in casted_cols:
953        if col in joined_df.columns:
954            joined_df[col] = joined_df[col].apply(deserializer)
955        if col in delta_df.columns:
956            delta_df[col] = delta_df[col].apply(deserializer)
957
958    ### Determine which rows are completely new.
959    new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None
960    cols = list(delta_df.columns)
961
962    unseen_df = (
963        joined_df
964        .where(new_rows_mask)
965        .dropna(how='all')[cols]
966        .reset_index(drop=True)
967    ) if on_cols else delta_df
968
969    ### Rows that have already been inserted but values have changed.
970    update_df = (
971        joined_df
972        .where(~new_rows_mask)
973        .dropna(how='all')[cols]
974        .reset_index(drop=True)
975    ) if on_cols else get_empty_df()
976
977    if include_unchanged_columns and on_cols:
978        unchanged_backtrack_cols = [
979            col
980            for col in backtrack_df.columns
981            if col in on_cols or col not in update_df.columns
982        ]
983        if enforce_dtypes:
984            update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug)
985        update_df = merge(
986            backtrack_df[unchanged_backtrack_cols],
987            update_df,
988            how='inner',
989            on=on_cols,
990        )
991
992    return unseen_df, update_df, delta_df

Inspect a dataframe and filter out rows which already exist in the pipe.

Parameters
  • df ('pd.DataFrame'): The dataframe to inspect and filter.
  • safe_copy (bool, default True): If True, create a copy before comparing and modifying the dataframes. Setting to False may mutate the DataFrames. See meerschaum.utils.dataframe.filter_unseen_df.
  • date_bound_only (bool, default False): If True, only use the datetime index to fetch the sample dataframe.
  • include_unchanged_columns (bool, default False): If True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys.
  • enforce_dtypes (bool, default False): If True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Setting enforce_dtypes=True may impact performance.
  • chunksize (Optional[int], default -1): The chunksize used when fetching existing data.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A tuple of three pandas DataFrames (unseen, update, and delta.):
def get_num_workers(self, workers: Optional[int] = None) -> int:
1017def get_num_workers(self, workers: Optional[int] = None) -> int:
1018    """
1019    Get the number of workers to use for concurrent syncs.
1020
1021    Parameters
1022    ----------
1023    The number of workers passed via `--workers`.
1024
1025    Returns
1026    -------
1027    The number of workers, capped for safety.
1028    """
1029    is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False)
1030    if not is_thread_safe:
1031        return 1
1032
1033    engine_pool_size = (
1034        self.instance_connector.engine.pool.size()
1035        if self.instance_connector.type == 'sql'
1036        else None
1037    )
1038    current_num_threads = threading.active_count()
1039    current_num_connections = (
1040        self.instance_connector.engine.pool.checkedout()
1041        if engine_pool_size is not None
1042        else current_num_threads
1043    )
1044    desired_workers = (
1045        min(workers or engine_pool_size, engine_pool_size)
1046        if engine_pool_size is not None
1047        else workers
1048    )
1049    if desired_workers is None:
1050        desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1)
1051
1052    return max(
1053        (desired_workers - current_num_connections),
1054        1,
1055    )

Get the number of workers to use for concurrent syncs.

Parameters
  • The number of workers passed via --workers.
Returns
  • The number of workers, capped for safety.
def verify( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, bounded: Optional[bool] = None, deduplicate: bool = False, workers: Optional[int] = None, batchsize: Optional[int] = None, skip_chunks_with_greater_rowcounts: bool = False, check_rowcounts_only: bool = False, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 19def verify(
 20    self,
 21    begin: Union[datetime, int, None] = None,
 22    end: Union[datetime, int, None] = None,
 23    params: Optional[Dict[str, Any]] = None,
 24    chunk_interval: Union[timedelta, int, None] = None,
 25    bounded: Optional[bool] = None,
 26    deduplicate: bool = False,
 27    workers: Optional[int] = None,
 28    batchsize: Optional[int] = None,
 29    skip_chunks_with_greater_rowcounts: bool = False,
 30    check_rowcounts_only: bool = False,
 31    debug: bool = False,
 32    **kwargs: Any
 33) -> SuccessTuple:
 34    """
 35    Verify the contents of the pipe by resyncing its interval.
 36
 37    Parameters
 38    ----------
 39    begin: Union[datetime, int, None], default None
 40        If specified, only verify rows greater than or equal to this value.
 41
 42    end: Union[datetime, int, None], default None
 43        If specified, only verify rows less than this value.
 44
 45    chunk_interval: Union[timedelta, int, None], default None
 46        If provided, use this as the size of the chunk boundaries.
 47        Default to the value set in `pipe.parameters['chunk_minutes']` (1440).
 48
 49    bounded: Optional[bool], default None
 50        If `True`, do not verify older than the oldest sync time or newer than the newest.
 51        If `False`, verify unbounded syncs outside of the new and old sync times.
 52        The default behavior (`None`) is to bound only if a bound interval is set
 53        (e.g. `pipe.parameters['verify']['bound_days']`).
 54
 55    deduplicate: bool, default False
 56        If `True`, deduplicate the pipe's table after the verification syncs.
 57
 58    workers: Optional[int], default None
 59        If provided, limit the verification to this many threads.
 60        Use a value of `1` to sync chunks in series.
 61
 62    batchsize: Optional[int], default None
 63        If provided, sync this many chunks in parallel.
 64        Defaults to `Pipe.get_num_workers()`.
 65
 66    skip_chunks_with_greater_rowcounts: bool, default False
 67        If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's
 68        chunk rowcount equals or exceeds the remote's rowcount.
 69
 70    check_rowcounts_only: bool, default False
 71        If `True`, only compare rowcounts and print chunks which are out-of-sync.
 72
 73    debug: bool, default False
 74        Verbosity toggle.
 75
 76    kwargs: Any
 77        All keyword arguments are passed to `pipe.sync()`.
 78
 79    Returns
 80    -------
 81    A SuccessTuple indicating whether the pipe was successfully resynced.
 82    """
 83    from meerschaum.utils.pool import get_pool
 84    from meerschaum.utils.formatting import make_header
 85    from meerschaum.utils.misc import interval_str
 86    workers = self.get_num_workers(workers)
 87    check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only
 88
 89    ### Skip configured bounding in parameters
 90    ### if `bounded` is explicitly `False`.
 91    bound_time = (
 92        self.get_bound_time(debug=debug)
 93        if bounded is not False
 94        else None
 95    )
 96    if bounded is None:
 97        bounded = bound_time is not None
 98
 99    if bounded and begin is None:
100        begin = (
101            bound_time
102            if bound_time is not None
103            else self.get_sync_time(newest=False, debug=debug)
104        )
105        if begin is None:
106            remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug)
107            begin = remote_oldest_sync_time
108    if bounded and end is None:
109        end = self.get_sync_time(newest=True, debug=debug)
110        if end is None:
111            remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug)
112            end = remote_newest_sync_time
113        if end is not None:
114            end += (
115                timedelta(minutes=1)
116                if hasattr(end, 'tzinfo')
117                else 1
118            )
119
120    begin, end = self.parse_date_bounds(begin, end)
121    cannot_determine_bounds = bounded and begin is None and end is None
122
123    if cannot_determine_bounds and not check_rowcounts_only:
124        warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False)
125        sync_success, sync_msg = self.sync(
126            begin=begin,
127            end=end,
128            params=params,
129            workers=workers,
130            debug=debug,
131            **kwargs
132        )
133        if not sync_success:
134            return sync_success, sync_msg
135
136        if deduplicate:
137            return self.deduplicate(
138                begin=begin,
139                end=end,
140                params=params,
141                workers=workers,
142                debug=debug,
143                **kwargs
144            )
145        return sync_success, sync_msg
146
147    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
148    chunk_bounds = self.get_chunk_bounds(
149        begin=begin,
150        end=end,
151        chunk_interval=chunk_interval,
152        bounded=bounded,
153        debug=debug,
154    )
155
156    ### Consider it a success if no chunks need to be verified.
157    if not chunk_bounds:
158        if deduplicate:
159            return self.deduplicate(
160                begin=begin,
161                end=end,
162                params=params,
163                workers=workers,
164                debug=debug,
165                **kwargs
166            )
167        return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do."
168
169    begin_to_print = (
170        begin
171        if begin is not None
172        else (
173            chunk_bounds[0][0]
174            if bounded
175            else chunk_bounds[0][1]
176        )
177    )
178    end_to_print = (
179        end
180        if end is not None
181        else (
182            chunk_bounds[-1][1]
183            if bounded
184            else chunk_bounds[-1][0]
185        )
186    )
187    message_header = f"{begin_to_print} - {end_to_print}"
188    max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs')
189
190    info(
191        f"Verifying {self}:\n    "
192        + ("Syncing" if not check_rowcounts_only else "Checking")
193        + f" {len(chunk_bounds)} chunk"
194        + ('s' if len(chunk_bounds) != 1 else '')
195        + f" ({'un' if not bounded else ''}bounded)"
196        + f" of size '{interval_str(chunk_interval)}'"
197        + f" between '{begin_to_print}' and '{end_to_print}'.\n"
198    )
199
200    ### Dictionary of the form bounds -> success_tuple, e.g.:
201    ### {
202    ###    (2023-01-01, 2023-01-02): (True, "Success")
203    ### }
204    bounds_success_tuples = {}
205    def process_chunk_bounds(
206        chunk_begin_and_end: Tuple[
207            Union[int, datetime],
208            Union[int, datetime]
209        ],
210        _workers: Optional[int] = 1,
211    ):
212        if chunk_begin_and_end in bounds_success_tuples:
213            return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end]
214
215        chunk_begin, chunk_end = chunk_begin_and_end
216        do_sync = True
217        chunk_success, chunk_msg = False, "Did not sync chunk."
218        if check_rowcounts:
219            existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug)
220            remote_rowcount = self.get_rowcount(
221                begin=chunk_begin,
222                end=chunk_end,
223                remote=True,
224                debug=debug,
225            )
226            checked_rows_str = (
227                f"checked {existing_rowcount:,} row"
228                + ("s" if existing_rowcount != 1 else '')
229                + f" vs {remote_rowcount:,} remote"
230            )
231            if (
232                existing_rowcount is not None
233                and remote_rowcount is not None
234                and existing_rowcount >= remote_rowcount
235            ):
236                do_sync = False
237                chunk_success, chunk_msg = True, (
238                    "Row-count is up-to-date "
239                    f"({checked_rows_str})."
240                )
241            elif check_rowcounts_only:
242                do_sync = False
243                chunk_success, chunk_msg = True, (
244                    f"Row-counts are out-of-sync ({checked_rows_str})."
245                )
246
247        num_syncs = 0
248        while num_syncs < max_chunks_syncs:
249            chunk_success, chunk_msg = self.sync(
250                begin=chunk_begin,
251                end=chunk_end,
252                params=params,
253                workers=_workers,
254                debug=debug,
255                **kwargs
256            ) if do_sync else (chunk_success, chunk_msg)
257            if chunk_success:
258                break
259            num_syncs += 1
260            time.sleep(num_syncs**2)
261        chunk_msg = chunk_msg.strip()
262        if ' - ' not in chunk_msg:
263            chunk_label = f"{chunk_begin} - {chunk_end}"
264            chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}'
265        mrsm.pprint((chunk_success, chunk_msg))
266
267        return chunk_begin_and_end, (chunk_success, chunk_msg)
268
269    ### If we have more than one chunk, attempt to sync the first one and return if its fails.
270    if len(chunk_bounds) > 1:
271        first_chunk_bounds = chunk_bounds[0]
272        first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}"
273        info(f"Verifying first chunk for {self}:\n    {first_label}")
274        (
275            (first_begin, first_end),
276            (first_success, first_msg)
277        ) = process_chunk_bounds(first_chunk_bounds, _workers=workers)
278        if not first_success:
279            return (
280                first_success,
281                f"\n{first_label}\n"
282                + f"Failed to sync first chunk:\n{first_msg}"
283            )
284        bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg)
285        info(f"Completed first chunk for {self}:\n    {first_label}\n")
286        chunk_bounds = chunk_bounds[1:]
287
288    pool = get_pool(workers=workers)
289    batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers)
290
291    def process_batch(
292        batch_chunk_bounds: Tuple[
293            Tuple[Union[datetime, int, None], Union[datetime, int, None]],
294            ...
295        ]
296    ):
297        _batch_begin = batch_chunk_bounds[0][0]
298        _batch_end = batch_chunk_bounds[-1][-1]
299        batch_message_header = f"{_batch_begin} - {_batch_end}"
300
301        if check_rowcounts_only:
302            info(f"Checking row-counts for batch bounds:\n    {batch_message_header}")
303            _, (batch_init_success, batch_init_msg) = process_chunk_bounds(
304                (_batch_begin, _batch_end)
305            )
306            mrsm.pprint((batch_init_success, batch_init_msg))
307            if batch_init_success and 'up-to-date' in batch_init_msg:
308                info("Entire batch is up-to-date.")
309                return batch_init_success, batch_init_msg
310
311        batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds))
312        bounds_success_tuples.update(batch_bounds_success_tuples)
313        batch_bounds_success_bools = {
314            bounds: tup[0]
315            for bounds, tup in batch_bounds_success_tuples.items()
316        }
317
318        if all(batch_bounds_success_bools.values()):
319            msg = get_chunks_success_message(
320                batch_bounds_success_tuples,
321                header=batch_message_header,
322                check_rowcounts_only=check_rowcounts_only,
323            )
324            if deduplicate:
325                deduplicate_success, deduplicate_msg = self.deduplicate(
326                    begin=_batch_begin,
327                    end=_batch_end,
328                    params=params,
329                    workers=workers,
330                    debug=debug,
331                    **kwargs
332                )
333                return deduplicate_success, msg + '\n\n' + deduplicate_msg
334            return True, msg
335
336        batch_chunk_bounds_to_resync = [
337            bounds
338            for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools)
339            if not success
340        ]
341        batch_bounds_to_print = [
342            f"{bounds[0]} - {bounds[1]}"
343            for bounds in batch_chunk_bounds_to_resync
344        ]
345        if batch_bounds_to_print:
346            warn(
347                "Will resync the following failed chunks:\n    "
348                + '\n    '.join(batch_bounds_to_print),
349                stack=False,
350            )
351
352        retry_bounds_success_tuples = dict(pool.map(
353            process_chunk_bounds,
354            batch_chunk_bounds_to_resync
355        ))
356        batch_bounds_success_tuples.update(retry_bounds_success_tuples)
357        bounds_success_tuples.update(retry_bounds_success_tuples)
358        retry_bounds_success_bools = {
359            bounds: tup[0]
360            for bounds, tup in retry_bounds_success_tuples.items()
361        }
362
363        if all(retry_bounds_success_bools.values()):
364            chunks_message = (
365                get_chunks_success_message(
366                    batch_bounds_success_tuples,
367                    header=batch_message_header,
368                    check_rowcounts_only=check_rowcounts_only,
369                ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + (
370                    's'
371                    if len(batch_chunk_bounds_to_resync) != 1
372                    else ''
373                ) + "."
374            )
375            if deduplicate:
376                deduplicate_success, deduplicate_msg = self.deduplicate(
377                    begin=_batch_begin,
378                    end=_batch_end,
379                    params=params,
380                    workers=workers,
381                    debug=debug,
382                    **kwargs
383                )
384                return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg
385            return True, chunks_message
386
387        batch_chunks_message = get_chunks_success_message(
388            batch_bounds_success_tuples,
389            header=batch_message_header,
390            check_rowcounts_only=check_rowcounts_only,
391        )
392        if deduplicate:
393            deduplicate_success, deduplicate_msg = self.deduplicate(
394                begin=begin,
395                end=end,
396                params=params,
397                workers=workers,
398                debug=debug,
399                **kwargs
400            )
401            return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg
402        return False, batch_chunks_message
403
404    num_batches = len(batches)
405    for batch_i, batch in enumerate(batches):
406        batch_begin = batch[0][0]
407        batch_end = batch[-1][-1]
408        batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})"
409        batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}"
410        retry_failed_batch = True
411        try:
412            for_self = 'for ' + str(self)
413            batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n    ')
414            info(f"Verifying {batch_label_str}\n")
415            batch_success, batch_msg = process_batch(batch)
416        except (KeyboardInterrupt, Exception) as e:
417            batch_success = False
418            batch_msg = str(e)
419            retry_failed_batch = False
420
421        batch_msg_to_print = (
422            f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}"
423        )
424        mrsm.pprint((batch_success, batch_msg_to_print))
425
426        if not batch_success and retry_failed_batch:
427            info(f"Retrying batch {batch_counter_str}...")
428            retry_batch_success, retry_batch_msg = process_batch(batch)
429            retry_batch_msg_to_print = (
430                f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}"
431            )
432            mrsm.pprint((retry_batch_success, retry_batch_msg_to_print))
433
434            batch_success = retry_batch_success
435            batch_msg = retry_batch_msg
436
437        if not batch_success:
438            return False, f"Failed to verify {batch_label}:\n\n{batch_msg}"
439
440    chunks_message = get_chunks_success_message(
441        bounds_success_tuples,
442        header=message_header,
443        check_rowcounts_only=check_rowcounts_only,
444    )
445    return True, chunks_message

Verify the contents of the pipe by resyncing its interval.

Parameters
  • begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
  • end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this as the size of the chunk boundaries. Default to the value set in pipe.parameters['chunk_minutes'] (1440).
  • bounded (Optional[bool], default None): If True, do not verify older than the oldest sync time or newer than the newest. If False, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g. pipe.parameters['verify']['bound_days']).
  • deduplicate (bool, default False): If True, deduplicate the pipe's table after the verification syncs.
  • workers (Optional[int], default None): If provided, limit the verification to this many threads. Use a value of 1 to sync chunks in series.
  • batchsize (Optional[int], default None): If provided, sync this many chunks in parallel. Defaults to Pipe.get_num_workers().
  • skip_chunks_with_greater_rowcounts (bool, default False): If True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount.
  • check_rowcounts_only (bool, default False): If True, only compare rowcounts and print chunks which are out-of-sync.
  • debug (bool, default False): Verbosity toggle.
  • kwargs (Any): All keyword arguments are passed to pipe.sync().
Returns
  • A SuccessTuple indicating whether the pipe was successfully resynced.
def get_bound_interval(self, debug: bool = False) -> Union[datetime.timedelta, int, NoneType]:
546def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]:
547    """
548    Return the interval used to determine the bound time (limit for verification syncs).
549    If the datetime axis is an integer, just return its value.
550
551    Below are the supported keys for the bound interval:
552
553        - `pipe.parameters['verify']['bound_minutes']`
554        - `pipe.parameters['verify']['bound_hours']`
555        - `pipe.parameters['verify']['bound_days']`
556        - `pipe.parameters['verify']['bound_weeks']`
557        - `pipe.parameters['verify']['bound_years']`
558        - `pipe.parameters['verify']['bound_seconds']`
559
560    If multiple keys are present, the first on this priority list will be used.
561
562    Returns
563    -------
564    A `timedelta` or `int` value to be used to determine the bound time.
565    """
566    verify_params = self.parameters.get('verify', {})
567    prefix = 'bound_'
568    suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds')
569    keys_to_search = {
570        key: val
571        for key, val in verify_params.items()
572        if key.startswith(prefix)
573    }
574    bound_time_key, bound_time_value = None, None
575    for key, value in keys_to_search.items():
576        for suffix in suffixes_to_check:
577            if key == prefix + suffix:
578                bound_time_key = key
579                bound_time_value = value
580                break
581        if bound_time_key is not None:
582            break
583
584    if bound_time_value is None:
585        return bound_time_value
586
587    dt_col = self.columns.get('datetime', None)
588    if not dt_col:
589        return bound_time_value
590
591    dt_typ = self.dtypes.get(dt_col, 'datetime')
592    if 'int' in dt_typ.lower():
593        return int(bound_time_value)
594
595    interval_type = bound_time_key.replace(prefix, '')
596    return timedelta(**{interval_type: bound_time_value})

Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.

Below are the supported keys for the bound interval:

- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`

If multiple keys are present, the first on this priority list will be used.

Returns
  • A timedelta or int value to be used to determine the bound time.
def get_bound_time(self, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
599def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]:
600    """
601    The bound time is the limit at which long-running verification syncs should stop.
602    A value of `None` means verification syncs should be unbounded.
603
604    Like deriving a backtrack time from `pipe.get_sync_time()`,
605    the bound time is the sync time minus a large window (e.g. 366 days).
606
607    Unbound verification syncs (i.e. `bound_time is None`)
608    if the oldest sync time is less than the bound interval.
609
610    Returns
611    -------
612    A `datetime` or `int` corresponding to the
613    `begin` bound for verification and deduplication syncs.
614    """
615    bound_interval = self.get_bound_interval(debug=debug)
616    if bound_interval is None:
617        return None
618
619    sync_time = self.get_sync_time(debug=debug)
620    if sync_time is None:
621        return None
622
623    bound_time = sync_time - bound_interval
624    oldest_sync_time = self.get_sync_time(newest=False, debug=debug)
625    max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days']
626
627    extreme_sync_times_delta = (
628        hasattr(oldest_sync_time, 'tzinfo')
629        and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days)
630    )
631
632    return (
633        bound_time
634        if bound_time > oldest_sync_time or extreme_sync_times_delta
635        else None
636    )

The bound time is the limit at which long-running verification syncs should stop. A value of None means verification syncs should be unbounded.

Like deriving a backtrack time from pipe.get_sync_time(), the bound time is the sync time minus a large window (e.g. 366 days).

Unbound verification syncs (i.e. bound_time is None) if the oldest sync time is less than the bound interval.

Returns
  • A datetime or int corresponding to the
  • begin bound for verification and deduplication syncs.
def delete(self, drop: bool = True, debug: bool = False, **kw) -> Tuple[bool, str]:
12def delete(
13    self,
14    drop: bool = True,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Call the Pipe's instance connector's `delete_pipe()` method.
20
21    Parameters
22    ----------
23    drop: bool, default True
24        If `True`, drop the pipes' target table.
25
26    debug : bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success (`bool`), message (`str`).
32
33    """
34    from meerschaum.utils.warnings import warn
35    from meerschaum.utils.venv import Venv
36    from meerschaum.connectors import get_connector_plugin
37
38    if self.temporary:
39        if self.cache:
40            invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug)
41            if not invalidate_success:
42                return invalidate_success, invalidate_msg
43
44        return (
45            False,
46            "Cannot delete pipes created with `temporary=True` (read-only). "
47            + "You may want to call `pipe.drop()` instead."
48        )
49
50    if drop:
51        drop_success, drop_msg = self.drop(debug=debug)
52        if not drop_success:
53            warn(f"Failed to drop {self}:\n{drop_msg}")
54
55    with Venv(get_connector_plugin(self.instance_connector)):
56        result = self.instance_connector.delete_pipe(self, debug=debug, **kw)
57
58    if not isinstance(result, tuple):
59        return False, f"Received an unexpected result from '{self.instance_connector}': {result}"
60
61    if result[0]:
62        self._invalidate_cache(hard=True, debug=debug)
63        self._clear_cache_key('_id', debug=debug)
64
65    return result

Call the Pipe's instance connector's delete_pipe() method.

Parameters
  • drop (bool, default True): If True, drop the pipes' target table.
  • debug (bool, default False): Verbosity toggle.
Returns
def drop(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def drop(
15    self,
16    debug: bool = False,
17    **kw: Any
18) -> SuccessTuple:
19    """
20    Call the Pipe's instance connector's `drop_pipe()` method.
21
22    Parameters
23    ----------
24    debug: bool, default False:
25        Verbosity toggle.
26
27    Returns
28    -------
29    A `SuccessTuple` of success, message.
30
31    """
32    from meerschaum.utils.venv import Venv
33    from meerschaum.connectors import get_connector_plugin
34
35    self._clear_cache_key('_exists', debug=debug)
36
37    with Venv(get_connector_plugin(self.instance_connector)):
38        if hasattr(self.instance_connector, 'drop_pipe'):
39            result = self.instance_connector.drop_pipe(self, debug=debug, **kw)
40        else:
41            result = (
42                False,
43                (
44                    "Cannot drop pipes for instance connectors of type "
45                    f"'{self.instance_connector.type}'."
46                )
47            )
48
49    self._clear_cache_key('_exists', debug=debug)
50    self._clear_cache_key('_exists_timestamp', debug=debug)
51
52    return result

Call the Pipe's instance connector's drop_pipe() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def drop_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 55def drop_indices(
 56    self,
 57    columns: Optional[List[str]] = None,
 58    debug: bool = False,
 59    **kw: Any
 60) -> SuccessTuple:
 61    """
 62    Call the Pipe's instance connector's `drop_indices()` method.
 63
 64    Parameters
 65    ----------
 66    columns: Optional[List[str]] = None
 67        If provided, only drop indices in the given list.
 68
 69    debug: bool, default False:
 70        Verbosity toggle.
 71
 72    Returns
 73    -------
 74    A `SuccessTuple` of success, message.
 75
 76    """
 77    from meerschaum.utils.venv import Venv
 78    from meerschaum.connectors import get_connector_plugin
 79
 80    self._clear_cache_key('_columns_indices', debug=debug)
 81    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
 82    self._clear_cache_key('_columns_types', debug=debug)
 83    self._clear_cache_key('_columns_types_timestamp', debug=debug)
 84
 85    with Venv(get_connector_plugin(self.instance_connector)):
 86        if hasattr(self.instance_connector, 'drop_pipe_indices'):
 87            result = self.instance_connector.drop_pipe_indices(
 88                self,
 89                columns=columns,
 90                debug=debug,
 91                **kw
 92            )
 93        else:
 94            result = (
 95                False,
 96                (
 97                    "Cannot drop indices for instance connectors of type "
 98                    f"'{self.instance_connector.type}'."
 99                )
100            )
101
102    self._clear_cache_key('_columns_indices', debug=debug)
103    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
104    self._clear_cache_key('_columns_types', debug=debug)
105    self._clear_cache_key('_columns_types_timestamp', debug=debug)
106
107    return result

Call the Pipe's instance connector's drop_indices() method.

Parameters
  • columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
  • debug (bool, default False:): Verbosity toggle.
Returns
def create_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def create_indices(
15    self,
16    columns: Optional[List[str]] = None,
17    debug: bool = False,
18    **kw: Any
19) -> SuccessTuple:
20    """
21    Call the Pipe's instance connector's `create_pipe_indices()` method.
22
23    Parameters
24    ----------
25    debug: bool, default False:
26        Verbosity toggle.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31
32    """
33    from meerschaum.utils.venv import Venv
34    from meerschaum.connectors import get_connector_plugin
35
36    self._clear_cache_key('_columns_indices', debug=debug)
37    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
38    self._clear_cache_key('_columns_types', debug=debug)
39    self._clear_cache_key('_columns_types_timestamp', debug=debug)
40
41    with Venv(get_connector_plugin(self.instance_connector)):
42        if hasattr(self.instance_connector, 'create_pipe_indices'):
43            result = self.instance_connector.create_pipe_indices(
44                self,
45                columns=columns,
46                debug=debug,
47                **kw
48            )
49        else:
50            result = (
51                False,
52                (
53                    "Cannot create indices for instance connectors of type "
54                    f"'{self.instance_connector.type}'."
55                )
56            )
57
58    self._clear_cache_key('_columns_indices', debug=debug)
59    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
60    self._clear_cache_key('_columns_types', debug=debug)
61    self._clear_cache_key('_columns_types_timestamp', debug=debug)
62
63    return result

Call the Pipe's instance connector's create_pipe_indices() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def clear( self, begin: Optional[datetime.datetime] = None, end: Optional[datetime.datetime] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
16def clear(
17    self,
18    begin: Optional[datetime] = None,
19    end: Optional[datetime] = None,
20    params: Optional[Dict[str, Any]] = None,
21    debug: bool = False,
22    **kwargs: Any
23) -> SuccessTuple:
24    """
25    Call the Pipe's instance connector's `clear_pipe` method.
26
27    Parameters
28    ----------
29    begin: Optional[datetime], default None:
30        If provided, only remove rows newer than this datetime value.
31
32    end: Optional[datetime], default None:
33        If provided, only remove rows older than this datetime column (not including end).
34
35    params: Optional[Dict[str, Any]], default None
36         See `meerschaum.utils.sql.build_where`.
37
38    debug: bool, default False:
39        Verbositity toggle.
40
41    Returns
42    -------
43    A `SuccessTuple` corresponding to whether this procedure completed successfully.
44
45    Examples
46    --------
47    >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
48    >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
49    >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
50    >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
51    >>> 
52    >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
53    >>> pipe.get_data()
54              dt
55    0 2020-01-01
56
57    """
58    from meerschaum.utils.warnings import warn
59    from meerschaum.utils.venv import Venv
60    from meerschaum.connectors import get_connector_plugin
61
62    begin, end = self.parse_date_bounds(begin, end)
63
64    with Venv(get_connector_plugin(self.instance_connector)):
65        return self.instance_connector.clear_pipe(
66            self,
67            begin=begin,
68            end=end,
69            params=params,
70            debug=debug,
71            **kwargs
72        )

Call the Pipe's instance connector's clear_pipe method.

Parameters
  • begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
  • end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): See meerschaum.utils.sql.build_where.
  • debug (bool, default False:): Verbositity toggle.
Returns
  • A SuccessTuple corresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>> 
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
          dt
0 2020-01-01
def deduplicate( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.datetime, int, NoneType] = None, bounded: Optional[bool] = None, workers: Optional[int] = None, debug: bool = False, _use_instance_method: bool = True, **kwargs: Any) -> Tuple[bool, str]:
 15def deduplicate(
 16    self,
 17    begin: Union[datetime, int, None] = None,
 18    end: Union[datetime, int, None] = None,
 19    params: Optional[Dict[str, Any]] = None,
 20    chunk_interval: Union[datetime, int, None] = None,
 21    bounded: Optional[bool] = None,
 22    workers: Optional[int] = None,
 23    debug: bool = False,
 24    _use_instance_method: bool = True,
 25    **kwargs: Any
 26) -> SuccessTuple:
 27    """
 28    Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows.
 29
 30    Parameters
 31    ----------
 32    begin: Union[datetime, int, None], default None:
 33        If provided, only deduplicate rows newer than this datetime value.
 34
 35    end: Union[datetime, int, None], default None:
 36        If provided, only deduplicate rows older than this datetime column (not including end).
 37
 38    params: Optional[Dict[str, Any]], default None
 39        Restrict deduplication to this filter (for multiplexed data streams).
 40        See `meerschaum.utils.sql.build_where`.
 41
 42    chunk_interval: Union[timedelta, int, None], default None
 43        If provided, use this for the chunk bounds.
 44        Defaults to the value set in `pipe.parameters['chunk_minutes']` (1440).
 45
 46    bounded: Optional[bool], default None
 47        Only check outside the oldest and newest sync times if bounded is explicitly `False`.
 48
 49    workers: Optional[int], default None
 50        If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
 51
 52    debug: bool, default False:
 53        Verbositity toggle.
 54
 55    kwargs: Any
 56        All other keyword arguments are passed to
 57        `pipe.sync()`, `pipe.clear()`, and `pipe.get_data().
 58
 59    Returns
 60    -------
 61    A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated.
 62    """
 63    from meerschaum.utils.warnings import warn, info
 64    from meerschaum.utils.misc import interval_str, items_str
 65    from meerschaum.utils.venv import Venv
 66    from meerschaum.connectors import get_connector_plugin
 67    from meerschaum.utils.pool import get_pool
 68
 69    begin, end = self.parse_date_bounds(begin, end)
 70
 71    workers = self.get_num_workers(workers=workers)
 72    pool = get_pool(workers=workers)
 73
 74    if _use_instance_method:
 75        with Venv(get_connector_plugin(self.instance_connector)):
 76            if hasattr(self.instance_connector, 'deduplicate_pipe'):
 77                return self.instance_connector.deduplicate_pipe(
 78                    self,
 79                    begin=begin,
 80                    end=end,
 81                    params=params,
 82                    bounded=bounded,
 83                    debug=debug,
 84                    **kwargs
 85                )
 86
 87    ### Only unbound if explicitly False.
 88    if bounded is None:
 89        bounded = True
 90    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
 91
 92    bound_time = self.get_bound_time(debug=debug)
 93    if bounded and begin is None:
 94        begin = (
 95            bound_time
 96            if bound_time is not None
 97            else self.get_sync_time(newest=False, debug=debug)
 98        )
 99    if bounded and end is None:
100        end = self.get_sync_time(newest=True, debug=debug)
101        if end is not None:
102            end += (
103                timedelta(minutes=1)
104                if hasattr(end, 'tzinfo')
105                else 1
106            )
107
108    chunk_bounds = self.get_chunk_bounds(
109        bounded=bounded,
110        begin=begin,
111        end=end,
112        chunk_interval=chunk_interval,
113        debug=debug,
114    )
115
116    indices = [col for col in self.columns.values() if col]
117    if not indices:
118        return False, "Cannot deduplicate without index columns."
119
120    def process_chunk_bounds(bounds) -> Tuple[
121        Tuple[
122            Union[datetime, int, None],
123            Union[datetime, int, None]
124        ],
125        SuccessTuple
126    ]:
127        ### Only selecting the index values here to keep bandwidth down.
128        chunk_begin, chunk_end = bounds
129        chunk_df = self.get_data(
130            select_columns=indices, 
131            begin=chunk_begin,
132            end=chunk_end,
133            params=params,
134            debug=debug,
135        )
136        if chunk_df is None:
137            return bounds, (True, "")
138        existing_chunk_len = len(chunk_df)
139        deduped_chunk_df = chunk_df.drop_duplicates(keep='last')
140        deduped_chunk_len = len(deduped_chunk_df)
141
142        if existing_chunk_len == deduped_chunk_len:
143            return bounds, (True, "")
144
145        chunk_msg_header = f"\n{chunk_begin} - {chunk_end}"
146        chunk_msg_body = ""
147
148        full_chunk = self.get_data(
149            begin=chunk_begin,
150            end=chunk_end,
151            params=params,
152            debug=debug,
153        )
154        if full_chunk is None or len(full_chunk) == 0:
155            return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...")
156
157        chunk_indices = [ix for ix in indices if ix in full_chunk.columns]
158        if not chunk_indices:
159            return bounds, (False, f"None of {items_str(indices)} were present in chunk.")
160        try:
161            full_chunk = full_chunk.drop_duplicates(
162                subset=chunk_indices,
163                keep='last'
164            ).reset_index(
165                drop=True,
166            )
167        except Exception as e:
168            return (
169                bounds,
170                (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})")
171            )
172
173        clear_success, clear_msg = self.clear(
174            begin=chunk_begin,
175            end=chunk_end,
176            params=params,
177            debug=debug,
178        )
179        if not clear_success:
180            chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n"
181            warn(chunk_msg_body)
182
183        sync_success, sync_msg = self.sync(full_chunk, debug=debug)
184        if not sync_success:
185            chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n"
186
187        ### Finally check if the deduplication worked.
188        chunk_rowcount = self.get_rowcount(
189            begin=chunk_begin,
190            end=chunk_end,
191            params=params,
192            debug=debug,
193        )
194        if chunk_rowcount != deduped_chunk_len:
195            return bounds, (
196                False, (
197                    chunk_msg_header + "\n"
198                    + chunk_msg_body + ("\n" if chunk_msg_body else '')
199                    + "Chunk rowcounts still differ ("
200                    + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)."
201                )
202            )
203
204        return bounds, (
205            True, (
206                chunk_msg_header + "\n"
207                + chunk_msg_body + ("\n" if chunk_msg_body else '')
208                + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows."
209            )
210        )
211
212    info(
213        f"Deduplicating {len(chunk_bounds)} chunk"
214        + ('s' if len(chunk_bounds) != 1 else '')
215        + f" ({'un' if not bounded else ''}bounded)"
216        + f" of size '{interval_str(chunk_interval)}'"
217        + f" on {self}."
218    )
219    bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds))
220    bounds_successes = {
221        bounds: success_tuple
222        for bounds, success_tuple in bounds_success_tuples.items()
223        if success_tuple[0]
224    }
225    bounds_failures = {
226        bounds: success_tuple
227        for bounds, success_tuple in bounds_success_tuples.items()
228        if not success_tuple[0]
229    }
230
231    ### No need to retry if everything failed.
232    if len(bounds_failures) > 0 and len(bounds_successes) == 0:
233        return (
234            False,
235            (
236                f"Failed to deduplicate {len(bounds_failures)} chunk"
237                + ('s' if len(bounds_failures) != 1 else '')
238                + ".\n"
239                + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg])
240            )
241        )
242
243    retry_bounds = [bounds for bounds in bounds_failures]
244    if not retry_bounds:
245        return (
246            True,
247            (
248                f"Successfully deduplicated {len(bounds_successes)} chunk"
249                + ('s' if len(bounds_successes) != 1 else '')
250                + ".\n"
251                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
252            ).rstrip('\n')
253        )
254
255    info(f"Retrying {len(retry_bounds)} chunks for {self}...")
256    retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds))
257    retry_bounds_successes = {
258        bounds: success_tuple
259        for bounds, success_tuple in bounds_success_tuples.items()
260        if success_tuple[0]
261    }
262    retry_bounds_failures = {
263        bounds: success_tuple
264        for bounds, success_tuple in bounds_success_tuples.items()
265        if not success_tuple[0]
266    }
267
268    bounds_successes.update(retry_bounds_successes)
269    if not retry_bounds_failures:
270        return (
271            True,
272            (
273                f"Successfully deduplicated {len(bounds_successes)} chunk"
274                + ('s' if len(bounds_successes) != 1 else '')
275                + f"({len(retry_bounds_successes)} retried):\n"
276                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
277            ).rstrip('\n')
278        )
279
280    return (
281        False,
282        (
283            f"Failed to deduplicate {len(bounds_failures)} chunk"
284            + ('s' if len(retry_bounds_failures) != 1 else '')
285            + ".\n"
286            + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg])
287        ).rstrip('\n')
288    )

Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.

Parameters
  • begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
  • end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): Restrict deduplication to this filter (for multiplexed data streams). See meerschaum.utils.sql.build_where.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this for the chunk bounds. Defaults to the value set in pipe.parameters['chunk_minutes'] (1440).
  • bounded (Optional[bool], default None): Only check outside the oldest and newest sync times if bounded is explicitly False.
  • workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
  • debug (bool, default False:): Verbositity toggle.
  • kwargs (Any): All other keyword arguments are passed to pipe.sync(), pipe.clear(), and `pipe.get_data().
Returns
  • A SuccessTuple corresponding to whether all of the chunks were successfully deduplicated.
def bootstrap( self, debug: bool = False, yes: bool = False, force: bool = False, noask: bool = False, shell: bool = False, **kw) -> Tuple[bool, str]:
 16def bootstrap(
 17    self,
 18    debug: bool = False,
 19    yes: bool = False,
 20    force: bool = False,
 21    noask: bool = False,
 22    shell: bool = False,
 23    **kw
 24) -> SuccessTuple:
 25    """
 26    Prompt the user to create a pipe's requirements all from one method.
 27    This method shouldn't be used in any automated scripts because it interactively
 28    prompts the user and therefore may hang.
 29
 30    Parameters
 31    ----------
 32    debug: bool, default False:
 33        Verbosity toggle.
 34
 35    yes: bool, default False:
 36        Print the questions and automatically agree.
 37
 38    force: bool, default False:
 39        Skip the questions and agree anyway.
 40
 41    noask: bool, default False:
 42        Print the questions but go with the default answer.
 43
 44    shell: bool, default False:
 45        Used to determine if we are in the interactive shell.
 46        
 47    Returns
 48    -------
 49    A `SuccessTuple` corresponding to the success of this procedure.
 50
 51    """
 52
 53    from meerschaum.utils.warnings import info
 54    from meerschaum.utils.prompt import prompt, yes_no
 55    from meerschaum.utils.formatting import pprint
 56    from meerschaum.config import get_config
 57    from meerschaum.utils.formatting._shell import clear_screen
 58    from meerschaum.utils.formatting import print_tuple
 59    from meerschaum.actions import actions
 60    from meerschaum.utils.venv import Venv
 61    from meerschaum.connectors import get_connector_plugin
 62
 63    _clear = get_config('shell', 'clear_screen', patch=True)
 64
 65    if self.id is not None:
 66        delete_tuple = self.delete(debug=debug)
 67        if not delete_tuple[0]:
 68            return delete_tuple
 69
 70    if _clear:
 71        clear_screen(debug=debug)
 72
 73    _parameters = _get_parameters(self, debug=debug)
 74    self.parameters = _parameters
 75    pprint(self.parameters)
 76    try:
 77        prompt(
 78            f"\n    Press [Enter] to register {self} with the above configuration:",
 79            icon = False
 80        )
 81    except KeyboardInterrupt:
 82        return False, f"Aborted bootstrapping {self}."
 83
 84    with Venv(get_connector_plugin(self.instance_connector)):
 85        register_tuple = self.instance_connector.register_pipe(self, debug=debug)
 86
 87    if not register_tuple[0]:
 88        return register_tuple
 89
 90    if _clear:
 91        clear_screen(debug=debug)
 92
 93    try:
 94        if yes_no(
 95            f"Would you like to edit the definition for {self}?",
 96            yes=yes,
 97            noask=noask,
 98            default='n',
 99        ):
100            edit_tuple = self.edit_definition(debug=debug)
101            if not edit_tuple[0]:
102                return edit_tuple
103
104        if yes_no(
105            f"Would you like to try syncing {self} now?",
106            yes=yes,
107            noask=noask,
108            default='n',
109        ):
110            sync_tuple = actions['sync'](
111                ['pipes'],
112                connector_keys=[self.connector_keys],
113                metric_keys=[self.metric_key],
114                location_keys=[self.location_key],
115                mrsm_instance=str(self.instance_connector),
116                debug=debug,
117                shell=shell,
118            )
119            if not sync_tuple[0]:
120                return sync_tuple
121    except Exception as e:
122        return False, f"Failed to bootstrap {self}:\n" + str(e)
123
124    print_tuple((True, f"Finished bootstrapping {self}!"))
125    info(
126        "You can edit this pipe later with `edit pipes` "
127        + "or set the definition with `edit pipes definition`.\n"
128        + "    To sync data into your pipe, run `sync pipes`."
129    )
130
131    return True, "Success"

Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.

Parameters
  • debug (bool, default False:): Verbosity toggle.
  • yes (bool, default False:): Print the questions and automatically agree.
  • force (bool, default False:): Skip the questions and agree anyway.
  • noask (bool, default False:): Print the questions but go with the default answer.
  • shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
  • A SuccessTuple corresponding to the success of this procedure.
def enforce_dtypes( self, df: pandas.DataFrame, chunksize: Optional[int] = -1, enforce: bool = True, safe_copy: bool = True, dtypes: Optional[Dict[str, str]] = None, debug: bool = False) -> pandas.DataFrame:
 20def enforce_dtypes(
 21    self,
 22    df: 'pd.DataFrame',
 23    chunksize: Optional[int] = -1,
 24    enforce: bool = True,
 25    safe_copy: bool = True,
 26    dtypes: Optional[Dict[str, str]] = None,
 27    debug: bool = False,
 28) -> 'pd.DataFrame':
 29    """
 30    Cast the input dataframe to the pipe's registered data types.
 31    If the pipe does not exist and dtypes are not set, return the dataframe.
 32    """
 33    import traceback
 34    from meerschaum.utils.warnings import warn
 35    from meerschaum.utils.debug import dprint
 36    from meerschaum.utils.dataframe import (
 37        parse_df_datetimes,
 38        enforce_dtypes as _enforce_dtypes,
 39        parse_simple_lines,
 40    )
 41    from meerschaum.utils.dtypes import are_dtypes_equal
 42    from meerschaum.utils.packages import import_pandas
 43    pd = import_pandas(debug=debug)
 44    if df is None:
 45        if debug:
 46            dprint(
 47                "Received None instead of a DataFrame.\n"
 48                + "    Skipping dtype enforcement..."
 49            )
 50        return df
 51
 52    if not self.enforce:
 53        enforce = False
 54
 55    explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {}
 56    pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes
 57
 58    try:
 59        if isinstance(df, str):
 60            if df.strip() and df.strip()[0] not in ('{', '['):
 61                df = parse_df_datetimes(
 62                    parse_simple_lines(df),
 63                    ignore_cols=[
 64                        col
 65                        for col, dtype in pipe_dtypes.items()
 66                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 67                    ],
 68                )
 69            else:
 70                df = parse_df_datetimes(
 71                    pd.read_json(StringIO(df)),
 72                    ignore_cols=[
 73                        col
 74                        for col, dtype in pipe_dtypes.items()
 75                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 76                    ],
 77                    ignore_all=(not enforce),
 78                    strip_timezone=(self.tzinfo is None),
 79                    chunksize=chunksize,
 80                    debug=debug,
 81                )
 82        elif isinstance(df, (dict, list, tuple)):
 83            df = parse_df_datetimes(
 84                df,
 85                ignore_cols=[
 86                    col
 87                    for col, dtype in pipe_dtypes.items()
 88                    if (not enforce or not are_dtypes_equal(str(dtype), 'datetime'))
 89                ],
 90                strip_timezone=(self.tzinfo is None),
 91                chunksize=chunksize,
 92                debug=debug,
 93            )
 94    except Exception as e:
 95        warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}")
 96        return None
 97
 98    if not pipe_dtypes:
 99        if debug:
100            dprint(
101                f"Could not find dtypes for {self}.\n"
102                + "Skipping dtype enforcement..."
103            )
104        return df
105
106    return _enforce_dtypes(
107        df,
108        pipe_dtypes,
109        explicit_dtypes=explicit_dtypes,
110        safe_copy=safe_copy,
111        strip_timezone=(self.tzinfo is None),
112        coerce_numeric=self.mixed_numerics,
113        coerce_timezone=enforce,
114        debug=debug,
115    )

Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.

def infer_dtypes( self, persist: bool = False, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
118def infer_dtypes(
119    self,
120    persist: bool = False,
121    refresh: bool = False,
122    debug: bool = False,
123) -> Dict[str, Any]:
124    """
125    If `dtypes` is not set in `meerschaum.Pipe.parameters`,
126    infer the data types from the underlying table if it exists.
127
128    Parameters
129    ----------
130    persist: bool, default False
131        If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`.
132        NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only.
133
134    refresh: bool, default False
135        If `True`, retrieve the latest columns-types for the pipe.
136        See `Pipe.get_columns.types()`.
137
138    Returns
139    -------
140    A dictionary of strings containing the pandas data types for this Pipe.
141    """
142    if not self.exists(debug=debug):
143        return {}
144
145    from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type
146    from meerschaum.utils.dtypes import to_pandas_dtype
147
148    ### NOTE: get_columns_types() may return either the types as
149    ###       PostgreSQL- or Pandas-style.
150    columns_types = self.get_columns_types(refresh=refresh, debug=debug)
151
152    remote_pd_dtypes = {
153        c: (
154            get_pd_type_from_db_type(t, allow_custom_dtypes=True)
155            if str(t).isupper()
156            else to_pandas_dtype(t)
157        )
158        for c, t in columns_types.items()
159    } if columns_types else {}
160    if not persist:
161        return remote_pd_dtypes
162
163    parameters = self.get_parameters(refresh=refresh, debug=debug)
164    dtypes = parameters.get('dtypes', {})
165    dtypes.update({
166        col: typ
167        for col, typ in remote_pd_dtypes.items()
168        if col not in dtypes
169    })
170    self.dtypes = dtypes
171    self.edit(interactive=False, debug=debug)
172    return remote_pd_dtypes

If dtypes is not set in meerschaum.Pipe.parameters, infer the data types from the underlying table if it exists.

Parameters
  • persist (bool, default False): If True, persist the inferred data types to meerschaum.Pipe.parameters. NOTE: Use with caution! Generally dtypes is meant to be user-configurable only.
  • refresh (bool, default False): If True, retrieve the latest columns-types for the pipe. See Pipe.get_columns.types().
Returns
  • A dictionary of strings containing the pandas data types for this Pipe.
def copy_to( self, instance_keys: str, sync: bool = True, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 15def copy_to(
 16    self,
 17    instance_keys: str,
 18    sync: bool = True,
 19    begin: Union[datetime, int, None] = None,
 20    end: Union[datetime, int, None] = None,
 21    params: Optional[Dict[str, Any]] = None,
 22    chunk_interval: Union[timedelta, int, None] = None,
 23    debug: bool = False,
 24    **kwargs: Any
 25) -> SuccessTuple:
 26    """
 27    Copy a pipe to another instance.
 28
 29    Parameters
 30    ----------
 31    instance_keys: str
 32        The instance to which to copy this pipe.
 33
 34    sync: bool, default True
 35        If `True`, sync the source pipe's documents 
 36
 37    begin: Union[datetime, int, None], default None
 38        Beginning datetime value to pass to `Pipe.get_data()`.
 39
 40    end: Union[datetime, int, None], default None
 41        End datetime value to pass to `Pipe.get_data()`.
 42
 43    params: Optional[Dict[str, Any]], default None
 44        Parameters filter to pass to `Pipe.get_data()`.
 45
 46    chunk_interval: Union[timedelta, int, None], default None
 47        The size of chunks to retrieve from `Pipe.get_data()` for syncing.
 48
 49    kwargs: Any
 50        Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`.
 51
 52    Returns
 53    -------
 54    A SuccessTuple indicating success.
 55    """
 56    if str(instance_keys) == self.instance_keys:
 57        return False, f"Cannot copy {self} to instance '{instance_keys}'."
 58
 59    begin, end = self.parse_date_bounds(begin, end)
 60
 61    new_pipe = mrsm.Pipe(
 62        self.connector_keys,
 63        self.metric_key,
 64        self.location_key,
 65        parameters=self.parameters.copy(),
 66        instance=instance_keys,
 67    )
 68
 69    new_pipe_is_registered = new_pipe.id is not None
 70
 71    metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register
 72    metadata_success, metadata_msg = metadata_method(debug=debug)
 73    if not metadata_success:
 74        return metadata_success, metadata_msg
 75
 76    if not self.exists(debug=debug):
 77        return True, f"{self} does not exist; nothing to sync."
 78
 79    original_as_iterator = kwargs.get('as_iterator', None)
 80    kwargs['as_iterator'] = True
 81
 82    chunk_generator = self.get_data(
 83        begin=begin,
 84        end=end,
 85        params=params,
 86        chunk_interval=chunk_interval,
 87        debug=debug,
 88        **kwargs
 89    )
 90
 91    if original_as_iterator is None:
 92        _ = kwargs.pop('as_iterator', None)
 93    else:
 94        kwargs['as_iterator'] = original_as_iterator
 95
 96    sync_success, sync_msg = new_pipe.sync(
 97        chunk_generator,
 98        begin=begin,
 99        end=end,
100        params=params,
101        debug=debug,
102        **kwargs
103    )
104    msg = (
105        f"Successfully synced {new_pipe}:\n{sync_msg}"
106        if sync_success
107        else f"Failed to sync {new_pipe}:\n{sync_msg}"
108    )
109    return sync_success, msg

Copy a pipe to another instance.

Parameters
  • instance_keys (str): The instance to which to copy this pipe.
  • sync (bool, default True): If True, sync the source pipe's documents
  • begin (Union[datetime, int, None], default None): Beginning datetime value to pass to Pipe.get_data().
  • end (Union[datetime, int, None], default None): End datetime value to pass to Pipe.get_data().
  • params (Optional[Dict[str, Any]], default None): Parameters filter to pass to Pipe.get_data().
  • chunk_interval (Union[timedelta, int, None], default None): The size of chunks to retrieve from Pipe.get_data() for syncing.
  • kwargs (Any): Additional flags to pass to Pipe.get_data() and Pipe.sync(), e.g. workers.
Returns
  • A SuccessTuple indicating success.
class Plugin:
 30class Plugin:
 31    """Handle packaging of Meerschaum plugins."""
 32
 33    def __init__(
 34        self,
 35        name: str,
 36        version: Optional[str] = None,
 37        user_id: Optional[int] = None,
 38        required: Optional[List[str]] = None,
 39        attributes: Optional[Dict[str, Any]] = None,
 40        archive_path: Optional[pathlib.Path] = None,
 41        venv_path: Optional[pathlib.Path] = None,
 42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
 43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
 44    ):
 45        import meerschaum.config.paths as paths
 46        from meerschaum._internal.static import STATIC_CONFIG
 47        sep = STATIC_CONFIG['plugins']['repo_separator']
 48        _repo = None
 49        if sep in name:
 50            try:
 51                name, _repo = name.split(sep)
 52            except Exception as e:
 53                error(f"Invalid plugin name: '{name}'")
 54        self._repo_in_name = _repo
 55
 56        if attributes is None:
 57            attributes = {}
 58        self.name = name
 59        self.attributes = attributes
 60        self.user_id = user_id
 61        self._version = version
 62        if required:
 63            self._required = required
 64        self.archive_path = (
 65            archive_path if archive_path is not None
 66            else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
 67        )
 68        self.venv_path = (
 69            venv_path if venv_path is not None
 70            else paths.VIRTENV_RESOURCES_PATH / self.name
 71        )
 72        self._repo_connector = repo_connector
 73        self._repo_keys = repo
 74
 75
 76    @property
 77    def repo_connector(self):
 78        """
 79        Return the repository connector for this plugin.
 80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
 81        """
 82        if self._repo_connector is None:
 83            from meerschaum.connectors.parse import parse_repo_keys
 84
 85            repo_keys = self._repo_keys or self._repo_in_name
 86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
 87                error(
 88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
 89                )
 90            repo_connector = parse_repo_keys(repo_keys)
 91            self._repo_connector = repo_connector
 92        return self._repo_connector
 93
 94
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version
106
107
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module
121
122
123    @property
124    def __file__(self) -> Union[str, None]:
125        """
126        Return the file path (str) of the plugin if it exists, otherwise `None`.
127        """
128        if self.__dict__.get('_module', None) is not None:
129            return self.module.__file__
130
131        import meerschaum.config.paths as paths
132
133        potential_dir = paths.PLUGINS_RESOURCES_PATH / self.name
134        if (
135            potential_dir.exists()
136            and potential_dir.is_dir()
137            and (potential_dir / '__init__.py').exists()
138        ):
139            return str((potential_dir / '__init__.py').as_posix())
140
141        potential_file = paths.PLUGINS_RESOURCES_PATH / (self.name + '.py')
142        if potential_file.exists() and not potential_file.is_dir():
143            return str(potential_file.as_posix())
144
145        return None
146
147
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path
159
160
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None
170
171
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path
255
256
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288
289        import meerschaum.config.paths as paths
290        from meerschaum.utils.warnings import warn, error
291        if debug:
292            from meerschaum.utils.debug import dprint
293        import tarfile
294        import re
295        import ast
296        from meerschaum.plugins import sync_plugins_symlinks
297        from meerschaum.utils.packages import attempt_import, reload_meerschaum
298        from meerschaum.utils.venv import init_venv
299        from meerschaum.utils.misc import safely_extract_tar
300        old_cwd = os.getcwd()
301        old_version = ''
302        new_version = ''
303        temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name
304        temp_dir.mkdir(exist_ok=True)
305
306        if not self.archive_path.exists():
307            return False, f"Missing archive file for plugin '{self}'."
308        if self.version is not None:
309            old_version = self.version
310            if debug:
311                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
312
313        if debug:
314            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
315
316        try:
317            with tarfile.open(self.archive_path, 'r:gz') as tarf:
318                safely_extract_tar(tarf, temp_dir)
319        except Exception as e:
320            warn(e)
321            return False, f"Failed to extract plugin '{self.name}'."
322
323        ### search for version information
324        files = os.listdir(temp_dir)
325        
326        if str(files[0]) == self.name:
327            is_dir = True
328        elif str(files[0]) == self.name + '.py':
329            is_dir = False
330        else:
331            error(f"Unknown format encountered for plugin '{self}'.")
332
333        fpath = temp_dir / files[0]
334        if is_dir:
335            fpath = fpath / '__init__.py'
336
337        init_venv(self.name, debug=debug)
338        with open(fpath, 'r', encoding='utf-8') as f:
339            init_lines = f.readlines()
340        new_version = None
341        for line in init_lines:
342            if '__version__' not in line:
343                continue
344            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
345            if not version_match:
346                continue
347            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
348            break
349        if not new_version:
350            warn(
351                f"No `__version__` defined for plugin '{self}'. "
352                + "Assuming new version...",
353                stack = False,
354            )
355
356        packaging_version = attempt_import('packaging.version')
357        try:
358            is_new_version = (not new_version and not old_version) or (
359                packaging_version.parse(old_version) < packaging_version.parse(new_version)
360            )
361            is_same_version = new_version and old_version and (
362                packaging_version.parse(old_version) == packaging_version.parse(new_version)
363            )
364        except Exception:
365            is_new_version, is_same_version = True, False
366
367        ### Determine where to permanently store the new plugin.
368        plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0]
369        for path in paths.PLUGINS_DIR_PATHS:
370            if not path.exists():
371                warn(f"Plugins path does not exist: {path}", stack=False)
372                continue
373
374            files_in_plugins_dir = os.listdir(path)
375            if (
376                self.name in files_in_plugins_dir
377                or
378                (self.name + '.py') in files_in_plugins_dir
379            ):
380                plugin_installation_dir_path = path
381                break
382
383        success_msg = (
384            f"Successfully installed plugin '{self}'"
385            + ("\n    (skipped dependencies)" if skip_deps else "")
386            + "."
387        )
388        success, abort = None, None
389
390        if is_same_version and not force:
391            success, msg = True, (
392                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
393                "    Install again with `-f` or `--force` to reinstall."
394            )
395            abort = True
396        elif is_new_version or force:
397            for src_dir, dirs, files in os.walk(temp_dir):
398                if success is not None:
399                    break
400                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
401                if not os.path.exists(dst_dir):
402                    os.mkdir(dst_dir)
403                for f in files:
404                    src_file = os.path.join(src_dir, f)
405                    dst_file = os.path.join(dst_dir, f)
406                    if os.path.exists(dst_file):
407                        os.remove(dst_file)
408
409                    if debug:
410                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
411                    try:
412                        shutil.move(src_file, dst_dir)
413                    except Exception:
414                        success, msg = False, (
415                            f"Failed to install plugin '{self}': " +
416                            f"Could not move file '{src_file}' to '{dst_dir}'"
417                        )
418                        print(msg)
419                        break
420            if success is None:
421                success, msg = True, success_msg
422        else:
423            success, msg = False, (
424                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
425                + f"attempted version {new_version}."
426            )
427
428        shutil.rmtree(temp_dir)
429        os.chdir(old_cwd)
430
431        ### Reload the plugin's module.
432        sync_plugins_symlinks(debug=debug)
433        if '_module' in self.__dict__:
434            del self.__dict__['_module']
435        init_venv(venv=self.name, force=True, debug=debug)
436        reload_meerschaum(debug=debug)
437
438        ### if we've already failed, return here
439        if not success or abort:
440            _ongoing_installations.remove(self.full_name)
441            return success, msg
442
443        ### attempt to install dependencies
444        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
445        if not dependencies_installed:
446            _ongoing_installations.remove(self.full_name)
447            return False, f"Failed to install dependencies for plugin '{self}'."
448
449        ### handling success tuple, bool, or other (typically None)
450        setup_tuple = self.setup(debug=debug)
451        if isinstance(setup_tuple, tuple):
452            if not setup_tuple[0]:
453                success, msg = setup_tuple
454        elif isinstance(setup_tuple, bool):
455            if not setup_tuple:
456                success, msg = False, (
457                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
458                    f"Check `setup()` in '{self.__file__}' for more information " +
459                    "(no error message provided)."
460                )
461            else:
462                success, msg = True, success_msg
463        elif setup_tuple is None:
464            success = True
465            msg = (
466                f"Post-install for plugin '{self}' returned None. " +
467                "Assuming plugin successfully installed."
468            )
469            warn(msg)
470        else:
471            success = False
472            msg = (
473                f"Post-install for plugin '{self}' returned unexpected value " +
474                f"of type '{type(setup_tuple)}': {setup_tuple}"
475            )
476
477        _ongoing_installations.remove(self.full_name)
478        _ = self.module
479        return success, msg
480
481
482    def remove_archive(
483        self,        
484        debug: bool = False
485    ) -> SuccessTuple:
486        """Remove a plugin's archive file."""
487        if not self.archive_path.exists():
488            return True, f"Archive file for plugin '{self}' does not exist."
489        try:
490            self.archive_path.unlink()
491        except Exception as e:
492            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
493        return True, "Success"
494
495
496    def remove_venv(
497        self,        
498        debug: bool = False
499    ) -> SuccessTuple:
500        """Remove a plugin's virtual environment."""
501        if not self.venv_path.exists():
502            return True, f"Virtual environment for plugin '{self}' does not exist."
503        try:
504            shutil.rmtree(self.venv_path)
505        except Exception as e:
506            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
507        return True, "Success"
508
509
510    def uninstall(self, debug: bool = False) -> SuccessTuple:
511        """
512        Remove a plugin, its virtual environment, and archive file.
513        """
514        from meerschaum.utils.packages import reload_meerschaum
515        from meerschaum.plugins import sync_plugins_symlinks
516        from meerschaum.utils.warnings import warn, info
517        warnings_thrown_count: int = 0
518        max_warnings: int = 3
519
520        if not self.is_installed():
521            info(
522                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
523                + "Checking for artifacts...",
524                stack = False,
525            )
526        else:
527            real_path = pathlib.Path(os.path.realpath(self.__file__))
528            try:
529                if real_path.name == '__init__.py':
530                    shutil.rmtree(real_path.parent)
531                else:
532                    real_path.unlink()
533            except Exception as e:
534                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
535                warnings_thrown_count += 1
536            else:
537                info(f"Removed source files for plugin '{self.name}'.")
538
539        if self.venv_path.exists():
540            success, msg = self.remove_venv(debug=debug)
541            if not success:
542                warn(msg, stack=False)
543                warnings_thrown_count += 1
544            else:
545                info(f"Removed virtual environment from plugin '{self.name}'.")
546
547        success = warnings_thrown_count < max_warnings
548        sync_plugins_symlinks(debug=debug)
549        self.deactivate_venv(force=True, debug=debug)
550        reload_meerschaum(debug=debug)
551        return success, (
552            f"Successfully uninstalled plugin '{self}'." if success
553            else f"Failed to uninstall plugin '{self}'."
554        )
555
556
557    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
558        """
559        If exists, run the plugin's `setup()` function.
560
561        Parameters
562        ----------
563        *args: str
564            The positional arguments passed to the `setup()` function.
565            
566        debug: bool, default False
567            Verbosity toggle.
568
569        **kw: Any
570            The keyword arguments passed to the `setup()` function.
571
572        Returns
573        -------
574        A `SuccessTuple` or `bool` indicating success.
575
576        """
577        from meerschaum.utils.debug import dprint
578        import inspect
579        _setup = None
580        for name, fp in inspect.getmembers(self.module):
581            if name == 'setup' and inspect.isfunction(fp):
582                _setup = fp
583                break
584
585        ### assume success if no setup() is found (not necessary)
586        if _setup is None:
587            return True
588
589        sig = inspect.signature(_setup)
590        has_debug, has_kw = ('debug' in sig.parameters), False
591        for k, v in sig.parameters.items():
592            if '**' in str(v):
593                has_kw = True
594                break
595
596        _kw = {}
597        if has_kw:
598            _kw.update(kw)
599        if has_debug:
600            _kw['debug'] = debug
601
602        if debug:
603            dprint(f"Running setup for plugin '{self}'...")
604        try:
605            self.activate_venv(debug=debug)
606            return_tuple = _setup(*args, **_kw)
607            self.deactivate_venv(debug=debug)
608        except Exception as e:
609            return False, str(e)
610
611        if isinstance(return_tuple, tuple):
612            return return_tuple
613        if isinstance(return_tuple, bool):
614            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
615        if return_tuple is None:
616            return False, f"Setup for Plugin '{self.name}' returned None."
617        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
618
619
620    def get_dependencies(
621        self,
622        debug: bool = False,
623    ) -> List[str]:
624        """
625        If the Plugin has specified dependencies in a list called `required`, return the list.
626        
627        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
628        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
629
630        Parameters
631        ----------
632        debug: bool, default False
633            Verbosity toggle.
634
635        Returns
636        -------
637        A list of required packages and plugins (str).
638
639        """
640        if '_required' in self.__dict__:
641            return self._required
642
643        ### If the plugin has not yet been imported,
644        ### infer the dependencies from the source text.
645        ### This is not super robust, and it doesn't feel right
646        ### having multiple versions of the logic.
647        ### This is necessary when determining the activation order
648        ### without having import the module.
649        ### For consistency's sake, the module-less method does not cache the requirements.
650        if self.__dict__.get('_module', None) is None:
651            file_path = self.__file__
652            if file_path is None:
653                return []
654            with open(file_path, 'r', encoding='utf-8') as f:
655                text = f.read()
656
657            if 'required' not in text:
658                return []
659
660            ### This has some limitations:
661            ### It relies on `required` being manually declared.
662            ### We lose the ability to dynamically alter the `required` list,
663            ### which is why we've kept the module-reliant method below.
664            import ast, re
665            ### NOTE: This technically would break 
666            ### if `required` was the very first line of the file.
667            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
668            if not req_start_match:
669                return []
670            req_start = req_start_match.start()
671            equals_sign = req_start + text[req_start:].find('=')
672
673            ### Dependencies may have brackets within the strings, so push back the index.
674            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
675            if first_opening_brace == -1:
676                return []
677
678            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
679            if next_closing_brace == -1:
680                return []
681
682            start_ix = first_opening_brace + 1
683            end_ix = next_closing_brace
684
685            num_braces = 0
686            while True:
687                if '[' not in text[start_ix:end_ix]:
688                    break
689                num_braces += 1
690                start_ix = end_ix
691                end_ix += text[end_ix + 1:].find(']') + 1
692
693            req_end = end_ix + 1
694            req_text = (
695                text[(first_opening_brace-1):req_end]
696                .lstrip()
697                .replace('=', '', 1)
698                .lstrip()
699                .rstrip()
700            )
701            try:
702                required = ast.literal_eval(req_text)
703            except Exception as e:
704                warn(
705                    f"Unable to determine requirements for plugin '{self.name}' "
706                    + "without importing the module.\n"
707                    + "    This may be due to dynamically setting the global `required` list.\n"
708                    + f"    {e}"
709                )
710                return []
711            return required
712
713        import inspect
714        self.activate_venv(dependencies=False, debug=debug)
715        required = []
716        for name, val in inspect.getmembers(self.module):
717            if name == 'required':
718                required = val
719                break
720        self._required = required
721        self.deactivate_venv(dependencies=False, debug=debug)
722        return required
723
724
725    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
726        """
727        Return a list of required Plugin objects.
728        """
729        from meerschaum.utils.warnings import warn
730        from meerschaum.config import get_config
731        from meerschaum._internal.static import STATIC_CONFIG
732        from meerschaum.connectors.parse import is_valid_connector_keys
733        plugins = []
734        _deps = self.get_dependencies(debug=debug)
735        sep = STATIC_CONFIG['plugins']['repo_separator']
736        plugin_names = [
737            _d[len('plugin:'):] for _d in _deps
738            if _d.startswith('plugin:') and len(_d) > len('plugin:')
739        ]
740        default_repo_keys = get_config('meerschaum', 'repository')
741        skipped_repo_keys = set()
742
743        for _plugin_name in plugin_names:
744            if sep in _plugin_name:
745                try:
746                    _plugin_name, _repo_keys = _plugin_name.split(sep)
747                except Exception:
748                    _repo_keys = default_repo_keys
749                    warn(
750                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
751                        + f"Will try to use '{_repo_keys}' instead.",
752                        stack = False,
753                    )
754            else:
755                _repo_keys = default_repo_keys
756
757            if _repo_keys in skipped_repo_keys:
758                continue
759
760            if not is_valid_connector_keys(_repo_keys):
761                warn(
762                    f"Invalid connector '{_repo_keys}'.\n"
763                    f"    Skipping required plugins from repository '{_repo_keys}'",
764                    stack=False,
765                )
766                continue
767
768            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
769
770        return plugins
771
772
773    def get_required_packages(self, debug: bool=False) -> List[str]:
774        """
775        Return the required package names (excluding plugins).
776        """
777        _deps = self.get_dependencies(debug=debug)
778        return [_d for _d in _deps if not _d.startswith('plugin:')]
779
780
781    def activate_venv(
782        self,
783        dependencies: bool = True,
784        init_if_not_exists: bool = True,
785        debug: bool = False,
786        **kw
787    ) -> bool:
788        """
789        Activate the virtual environments for the plugin and its dependencies.
790
791        Parameters
792        ----------
793        dependencies: bool, default True
794            If `True`, activate the virtual environments for required plugins.
795
796        Returns
797        -------
798        A bool indicating success.
799        """
800        import meerschaum.config.paths as paths
801        from meerschaum.utils.venv import venv_target_path
802        from meerschaum.utils.packages import activate_venv
803        from meerschaum.utils.misc import make_symlink, is_symlink
804
805        if dependencies:
806            for plugin in self.get_required_plugins(debug=debug):
807                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
808
809        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
810        venv_meerschaum_path = vtp / 'meerschaum'
811
812        try:
813            success, msg = True, "Success"
814            if is_symlink(venv_meerschaum_path):
815                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH:
816                    venv_meerschaum_path.unlink()
817                    success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH)
818        except Exception as e:
819            success, msg = False, str(e)
820        if not success:
821            warn(
822                f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n"
823                f"{msg}"
824            )
825
826        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
827
828
829    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
830        """
831        Deactivate the virtual environments for the plugin and its dependencies.
832
833        Parameters
834        ----------
835        dependencies: bool, default True
836            If `True`, deactivate the virtual environments for required plugins.
837
838        Returns
839        -------
840        A bool indicating success.
841        """
842        from meerschaum.utils.packages import deactivate_venv
843        success = deactivate_venv(self.name, debug=debug, **kw)
844        if dependencies:
845            for plugin in self.get_required_plugins(debug=debug):
846                plugin.deactivate_venv(debug=debug, **kw)
847        return success
848
849
850    def install_dependencies(
851        self,
852        force: bool = False,
853        debug: bool = False,
854    ) -> bool:
855        """
856        If specified, install dependencies.
857        
858        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
859        Meerschaum plugins from the same repository as this Plugin.
860        To install from a different repository, add the repo keys after `'@'`
861        (e.g. `'plugin:foo@api:bar'`).
862
863        Parameters
864        ----------
865        force: bool, default False
866            If `True`, continue with the installation, even if some
867            required packages fail to install.
868
869        debug: bool, default False
870            Verbosity toggle.
871
872        Returns
873        -------
874        A bool indicating success.
875        """
876        from meerschaum.utils.packages import pip_install, venv_contains_package
877        from meerschaum.utils.warnings import warn, info
878        _deps = self.get_dependencies(debug=debug)
879        if not _deps and self.requirements_file_path is None:
880            return True
881
882        plugins = self.get_required_plugins(debug=debug)
883        for _plugin in plugins:
884            if _plugin.name == self.name:
885                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
886                continue
887            _success, _msg = _plugin.repo_connector.install_plugin(
888                _plugin.name, debug=debug, force=force
889            )
890            if not _success:
891                warn(
892                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
893                    + f" for plugin '{self.name}':\n" + _msg,
894                    stack = False,
895                )
896                if not force:
897                    warn(
898                        "Try installing with the `--force` flag to continue anyway.",
899                        stack = False,
900                    )
901                    return False
902                info(
903                    "Continuing with installation despite the failure "
904                    + "(careful, things might be broken!)...",
905                    icon = False
906                )
907
908
909        ### First step: parse `requirements.txt` if it exists.
910        if self.requirements_file_path is not None:
911            if not pip_install(
912                requirements_file_path=self.requirements_file_path,
913                venv=self.name, debug=debug
914            ):
915                warn(
916                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
917                    stack = False,
918                )
919                if not force:
920                    warn(
921                        "Try installing with `--force` to continue anyway.",
922                        stack = False,
923                    )
924                    return False
925                info(
926                    "Continuing with installation despite the failure "
927                    + "(careful, things might be broken!)...",
928                    icon = False
929                )
930
931
932        ### Don't reinstall packages that are already included in required plugins.
933        packages = []
934        _packages = self.get_required_packages(debug=debug)
935        accounted_for_packages = set()
936        for package_name in _packages:
937            for plugin in plugins:
938                if venv_contains_package(package_name, plugin.name):
939                    accounted_for_packages.add(package_name)
940                    break
941        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
942
943        ### Attempt pip packages installation.
944        if packages:
945            for package in packages:
946                if not pip_install(package, venv=self.name, debug=debug):
947                    warn(
948                        f"Failed to install required package '{package}'"
949                        + f" for plugin '{self.name}'.",
950                        stack = False,
951                    )
952                    if not force:
953                        warn(
954                            "Try installing with `--force` to continue anyway.",
955                            stack = False,
956                        )
957                        return False
958                    info(
959                        "Continuing with installation despite the failure "
960                        + "(careful, things might be broken!)...",
961                        icon = False
962                    )
963        return True
964
965
966    @property
967    def full_name(self) -> str:
968        """
969        Include the repo keys with the plugin's name.
970        """
971        from meerschaum._internal.static import STATIC_CONFIG
972        sep = STATIC_CONFIG['plugins']['repo_separator']
973        return self.name + sep + str(self.repo_connector)
974
975
976    def __str__(self):
977        return self.name
978
979
980    def __repr__(self):
981        return f"Plugin('{self.name}', repo='{self.repo_connector}')"
982
983
984    def __del__(self):
985        pass

Handle packaging of Meerschaum plugins.

Plugin( name: str, version: Optional[str] = None, user_id: Optional[int] = None, required: Optional[List[str]] = None, attributes: Optional[Dict[str, Any]] = None, archive_path: Optional[pathlib.Path] = None, venv_path: Optional[pathlib.Path] = None, repo_connector: Optional[meerschaum.connectors.APIConnector] = None, repo: Union[meerschaum.connectors.APIConnector, str, NoneType] = None)
33    def __init__(
34        self,
35        name: str,
36        version: Optional[str] = None,
37        user_id: Optional[int] = None,
38        required: Optional[List[str]] = None,
39        attributes: Optional[Dict[str, Any]] = None,
40        archive_path: Optional[pathlib.Path] = None,
41        venv_path: Optional[pathlib.Path] = None,
42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
44    ):
45        import meerschaum.config.paths as paths
46        from meerschaum._internal.static import STATIC_CONFIG
47        sep = STATIC_CONFIG['plugins']['repo_separator']
48        _repo = None
49        if sep in name:
50            try:
51                name, _repo = name.split(sep)
52            except Exception as e:
53                error(f"Invalid plugin name: '{name}'")
54        self._repo_in_name = _repo
55
56        if attributes is None:
57            attributes = {}
58        self.name = name
59        self.attributes = attributes
60        self.user_id = user_id
61        self._version = version
62        if required:
63            self._required = required
64        self.archive_path = (
65            archive_path if archive_path is not None
66            else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
67        )
68        self.venv_path = (
69            venv_path if venv_path is not None
70            else paths.VIRTENV_RESOURCES_PATH / self.name
71        )
72        self._repo_connector = repo_connector
73        self._repo_keys = repo
name
attributes
user_id
archive_path
venv_path
repo_connector
76    @property
77    def repo_connector(self):
78        """
79        Return the repository connector for this plugin.
80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
81        """
82        if self._repo_connector is None:
83            from meerschaum.connectors.parse import parse_repo_keys
84
85            repo_keys = self._repo_keys or self._repo_in_name
86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
87                error(
88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
89                )
90            repo_connector = parse_repo_keys(repo_keys)
91            self._repo_connector = repo_connector
92        return self._repo_connector

Return the repository connector for this plugin. NOTE: This imports the connectors module, which imports certain plugin modules.

version
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version

Return the plugin's module version is defined (__version__) if it's defined.

module
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module

Return the Python module of the underlying plugin.

requirements_file_path: Optional[pathlib.Path]
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path

If a file named requirements.txt exists, return its path.

def is_installed(self, **kw) -> bool:
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None

Check whether a plugin is correctly installed.

Returns
  • A bool indicating whether a plugin exists and is successfully imported.
def make_tar(self, debug: bool = False) -> pathlib.Path:
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path

Compress the plugin's source files into a .tar.gz archive and return the archive's path.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pathlib.Path to the archive file's path.
def install( self, skip_deps: bool = False, force: bool = False, debug: bool = False) -> Tuple[bool, str]:
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288
289        import meerschaum.config.paths as paths
290        from meerschaum.utils.warnings import warn, error
291        if debug:
292            from meerschaum.utils.debug import dprint
293        import tarfile
294        import re
295        import ast
296        from meerschaum.plugins import sync_plugins_symlinks
297        from meerschaum.utils.packages import attempt_import, reload_meerschaum
298        from meerschaum.utils.venv import init_venv
299        from meerschaum.utils.misc import safely_extract_tar
300        old_cwd = os.getcwd()
301        old_version = ''
302        new_version = ''
303        temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name
304        temp_dir.mkdir(exist_ok=True)
305
306        if not self.archive_path.exists():
307            return False, f"Missing archive file for plugin '{self}'."
308        if self.version is not None:
309            old_version = self.version
310            if debug:
311                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
312
313        if debug:
314            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
315
316        try:
317            with tarfile.open(self.archive_path, 'r:gz') as tarf:
318                safely_extract_tar(tarf, temp_dir)
319        except Exception as e:
320            warn(e)
321            return False, f"Failed to extract plugin '{self.name}'."
322
323        ### search for version information
324        files = os.listdir(temp_dir)
325        
326        if str(files[0]) == self.name:
327            is_dir = True
328        elif str(files[0]) == self.name + '.py':
329            is_dir = False
330        else:
331            error(f"Unknown format encountered for plugin '{self}'.")
332
333        fpath = temp_dir / files[0]
334        if is_dir:
335            fpath = fpath / '__init__.py'
336
337        init_venv(self.name, debug=debug)
338        with open(fpath, 'r', encoding='utf-8') as f:
339            init_lines = f.readlines()
340        new_version = None
341        for line in init_lines:
342            if '__version__' not in line:
343                continue
344            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
345            if not version_match:
346                continue
347            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
348            break
349        if not new_version:
350            warn(
351                f"No `__version__` defined for plugin '{self}'. "
352                + "Assuming new version...",
353                stack = False,
354            )
355
356        packaging_version = attempt_import('packaging.version')
357        try:
358            is_new_version = (not new_version and not old_version) or (
359                packaging_version.parse(old_version) < packaging_version.parse(new_version)
360            )
361            is_same_version = new_version and old_version and (
362                packaging_version.parse(old_version) == packaging_version.parse(new_version)
363            )
364        except Exception:
365            is_new_version, is_same_version = True, False
366
367        ### Determine where to permanently store the new plugin.
368        plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0]
369        for path in paths.PLUGINS_DIR_PATHS:
370            if not path.exists():
371                warn(f"Plugins path does not exist: {path}", stack=False)
372                continue
373
374            files_in_plugins_dir = os.listdir(path)
375            if (
376                self.name in files_in_plugins_dir
377                or
378                (self.name + '.py') in files_in_plugins_dir
379            ):
380                plugin_installation_dir_path = path
381                break
382
383        success_msg = (
384            f"Successfully installed plugin '{self}'"
385            + ("\n    (skipped dependencies)" if skip_deps else "")
386            + "."
387        )
388        success, abort = None, None
389
390        if is_same_version and not force:
391            success, msg = True, (
392                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
393                "    Install again with `-f` or `--force` to reinstall."
394            )
395            abort = True
396        elif is_new_version or force:
397            for src_dir, dirs, files in os.walk(temp_dir):
398                if success is not None:
399                    break
400                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
401                if not os.path.exists(dst_dir):
402                    os.mkdir(dst_dir)
403                for f in files:
404                    src_file = os.path.join(src_dir, f)
405                    dst_file = os.path.join(dst_dir, f)
406                    if os.path.exists(dst_file):
407                        os.remove(dst_file)
408
409                    if debug:
410                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
411                    try:
412                        shutil.move(src_file, dst_dir)
413                    except Exception:
414                        success, msg = False, (
415                            f"Failed to install plugin '{self}': " +
416                            f"Could not move file '{src_file}' to '{dst_dir}'"
417                        )
418                        print(msg)
419                        break
420            if success is None:
421                success, msg = True, success_msg
422        else:
423            success, msg = False, (
424                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
425                + f"attempted version {new_version}."
426            )
427
428        shutil.rmtree(temp_dir)
429        os.chdir(old_cwd)
430
431        ### Reload the plugin's module.
432        sync_plugins_symlinks(debug=debug)
433        if '_module' in self.__dict__:
434            del self.__dict__['_module']
435        init_venv(venv=self.name, force=True, debug=debug)
436        reload_meerschaum(debug=debug)
437
438        ### if we've already failed, return here
439        if not success or abort:
440            _ongoing_installations.remove(self.full_name)
441            return success, msg
442
443        ### attempt to install dependencies
444        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
445        if not dependencies_installed:
446            _ongoing_installations.remove(self.full_name)
447            return False, f"Failed to install dependencies for plugin '{self}'."
448
449        ### handling success tuple, bool, or other (typically None)
450        setup_tuple = self.setup(debug=debug)
451        if isinstance(setup_tuple, tuple):
452            if not setup_tuple[0]:
453                success, msg = setup_tuple
454        elif isinstance(setup_tuple, bool):
455            if not setup_tuple:
456                success, msg = False, (
457                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
458                    f"Check `setup()` in '{self.__file__}' for more information " +
459                    "(no error message provided)."
460                )
461            else:
462                success, msg = True, success_msg
463        elif setup_tuple is None:
464            success = True
465            msg = (
466                f"Post-install for plugin '{self}' returned None. " +
467                "Assuming plugin successfully installed."
468            )
469            warn(msg)
470        else:
471            success = False
472            msg = (
473                f"Post-install for plugin '{self}' returned unexpected value " +
474                f"of type '{type(setup_tuple)}': {setup_tuple}"
475            )
476
477        _ongoing_installations.remove(self.full_name)
478        _ = self.module
479        return success, msg

Extract a plugin's tar archive to the plugins directory.

This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.

Parameters
  • skip_deps (bool, default False): If True, do not install dependencies.
  • force (bool, default False): If True, continue with installation, even if required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
def remove_archive(self, debug: bool = False) -> Tuple[bool, str]:
482    def remove_archive(
483        self,        
484        debug: bool = False
485    ) -> SuccessTuple:
486        """Remove a plugin's archive file."""
487        if not self.archive_path.exists():
488            return True, f"Archive file for plugin '{self}' does not exist."
489        try:
490            self.archive_path.unlink()
491        except Exception as e:
492            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
493        return True, "Success"

Remove a plugin's archive file.

def remove_venv(self, debug: bool = False) -> Tuple[bool, str]:
496    def remove_venv(
497        self,        
498        debug: bool = False
499    ) -> SuccessTuple:
500        """Remove a plugin's virtual environment."""
501        if not self.venv_path.exists():
502            return True, f"Virtual environment for plugin '{self}' does not exist."
503        try:
504            shutil.rmtree(self.venv_path)
505        except Exception as e:
506            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
507        return True, "Success"

Remove a plugin's virtual environment.

def uninstall(self, debug: bool = False) -> Tuple[bool, str]:
510    def uninstall(self, debug: bool = False) -> SuccessTuple:
511        """
512        Remove a plugin, its virtual environment, and archive file.
513        """
514        from meerschaum.utils.packages import reload_meerschaum
515        from meerschaum.plugins import sync_plugins_symlinks
516        from meerschaum.utils.warnings import warn, info
517        warnings_thrown_count: int = 0
518        max_warnings: int = 3
519
520        if not self.is_installed():
521            info(
522                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
523                + "Checking for artifacts...",
524                stack = False,
525            )
526        else:
527            real_path = pathlib.Path(os.path.realpath(self.__file__))
528            try:
529                if real_path.name == '__init__.py':
530                    shutil.rmtree(real_path.parent)
531                else:
532                    real_path.unlink()
533            except Exception as e:
534                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
535                warnings_thrown_count += 1
536            else:
537                info(f"Removed source files for plugin '{self.name}'.")
538
539        if self.venv_path.exists():
540            success, msg = self.remove_venv(debug=debug)
541            if not success:
542                warn(msg, stack=False)
543                warnings_thrown_count += 1
544            else:
545                info(f"Removed virtual environment from plugin '{self.name}'.")
546
547        success = warnings_thrown_count < max_warnings
548        sync_plugins_symlinks(debug=debug)
549        self.deactivate_venv(force=True, debug=debug)
550        reload_meerschaum(debug=debug)
551        return success, (
552            f"Successfully uninstalled plugin '{self}'." if success
553            else f"Failed to uninstall plugin '{self}'."
554        )

Remove a plugin, its virtual environment, and archive file.

def setup( self, *args: str, debug: bool = False, **kw: Any) -> Union[Tuple[bool, str], bool]:
557    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
558        """
559        If exists, run the plugin's `setup()` function.
560
561        Parameters
562        ----------
563        *args: str
564            The positional arguments passed to the `setup()` function.
565            
566        debug: bool, default False
567            Verbosity toggle.
568
569        **kw: Any
570            The keyword arguments passed to the `setup()` function.
571
572        Returns
573        -------
574        A `SuccessTuple` or `bool` indicating success.
575
576        """
577        from meerschaum.utils.debug import dprint
578        import inspect
579        _setup = None
580        for name, fp in inspect.getmembers(self.module):
581            if name == 'setup' and inspect.isfunction(fp):
582                _setup = fp
583                break
584
585        ### assume success if no setup() is found (not necessary)
586        if _setup is None:
587            return True
588
589        sig = inspect.signature(_setup)
590        has_debug, has_kw = ('debug' in sig.parameters), False
591        for k, v in sig.parameters.items():
592            if '**' in str(v):
593                has_kw = True
594                break
595
596        _kw = {}
597        if has_kw:
598            _kw.update(kw)
599        if has_debug:
600            _kw['debug'] = debug
601
602        if debug:
603            dprint(f"Running setup for plugin '{self}'...")
604        try:
605            self.activate_venv(debug=debug)
606            return_tuple = _setup(*args, **_kw)
607            self.deactivate_venv(debug=debug)
608        except Exception as e:
609            return False, str(e)
610
611        if isinstance(return_tuple, tuple):
612            return return_tuple
613        if isinstance(return_tuple, bool):
614            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
615        if return_tuple is None:
616            return False, f"Setup for Plugin '{self.name}' returned None."
617        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"

If exists, run the plugin's setup() function.

Parameters
  • *args (str): The positional arguments passed to the setup() function.
  • debug (bool, default False): Verbosity toggle.
  • **kw (Any): The keyword arguments passed to the setup() function.
Returns
def get_dependencies(self, debug: bool = False) -> List[str]:
620    def get_dependencies(
621        self,
622        debug: bool = False,
623    ) -> List[str]:
624        """
625        If the Plugin has specified dependencies in a list called `required`, return the list.
626        
627        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
628        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
629
630        Parameters
631        ----------
632        debug: bool, default False
633            Verbosity toggle.
634
635        Returns
636        -------
637        A list of required packages and plugins (str).
638
639        """
640        if '_required' in self.__dict__:
641            return self._required
642
643        ### If the plugin has not yet been imported,
644        ### infer the dependencies from the source text.
645        ### This is not super robust, and it doesn't feel right
646        ### having multiple versions of the logic.
647        ### This is necessary when determining the activation order
648        ### without having import the module.
649        ### For consistency's sake, the module-less method does not cache the requirements.
650        if self.__dict__.get('_module', None) is None:
651            file_path = self.__file__
652            if file_path is None:
653                return []
654            with open(file_path, 'r', encoding='utf-8') as f:
655                text = f.read()
656
657            if 'required' not in text:
658                return []
659
660            ### This has some limitations:
661            ### It relies on `required` being manually declared.
662            ### We lose the ability to dynamically alter the `required` list,
663            ### which is why we've kept the module-reliant method below.
664            import ast, re
665            ### NOTE: This technically would break 
666            ### if `required` was the very first line of the file.
667            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
668            if not req_start_match:
669                return []
670            req_start = req_start_match.start()
671            equals_sign = req_start + text[req_start:].find('=')
672
673            ### Dependencies may have brackets within the strings, so push back the index.
674            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
675            if first_opening_brace == -1:
676                return []
677
678            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
679            if next_closing_brace == -1:
680                return []
681
682            start_ix = first_opening_brace + 1
683            end_ix = next_closing_brace
684
685            num_braces = 0
686            while True:
687                if '[' not in text[start_ix:end_ix]:
688                    break
689                num_braces += 1
690                start_ix = end_ix
691                end_ix += text[end_ix + 1:].find(']') + 1
692
693            req_end = end_ix + 1
694            req_text = (
695                text[(first_opening_brace-1):req_end]
696                .lstrip()
697                .replace('=', '', 1)
698                .lstrip()
699                .rstrip()
700            )
701            try:
702                required = ast.literal_eval(req_text)
703            except Exception as e:
704                warn(
705                    f"Unable to determine requirements for plugin '{self.name}' "
706                    + "without importing the module.\n"
707                    + "    This may be due to dynamically setting the global `required` list.\n"
708                    + f"    {e}"
709                )
710                return []
711            return required
712
713        import inspect
714        self.activate_venv(dependencies=False, debug=debug)
715        required = []
716        for name, val in inspect.getmembers(self.module):
717            if name == 'required':
718                required = val
719                break
720        self._required = required
721        self.deactivate_venv(dependencies=False, debug=debug)
722        return required

If the Plugin has specified dependencies in a list called required, return the list.

NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages. Meerschaum plugins may also specify connector keys for a repo after '@'.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of required packages and plugins (str).
def get_required_plugins(self, debug: bool = False) -> List[Plugin]:
725    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
726        """
727        Return a list of required Plugin objects.
728        """
729        from meerschaum.utils.warnings import warn
730        from meerschaum.config import get_config
731        from meerschaum._internal.static import STATIC_CONFIG
732        from meerschaum.connectors.parse import is_valid_connector_keys
733        plugins = []
734        _deps = self.get_dependencies(debug=debug)
735        sep = STATIC_CONFIG['plugins']['repo_separator']
736        plugin_names = [
737            _d[len('plugin:'):] for _d in _deps
738            if _d.startswith('plugin:') and len(_d) > len('plugin:')
739        ]
740        default_repo_keys = get_config('meerschaum', 'repository')
741        skipped_repo_keys = set()
742
743        for _plugin_name in plugin_names:
744            if sep in _plugin_name:
745                try:
746                    _plugin_name, _repo_keys = _plugin_name.split(sep)
747                except Exception:
748                    _repo_keys = default_repo_keys
749                    warn(
750                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
751                        + f"Will try to use '{_repo_keys}' instead.",
752                        stack = False,
753                    )
754            else:
755                _repo_keys = default_repo_keys
756
757            if _repo_keys in skipped_repo_keys:
758                continue
759
760            if not is_valid_connector_keys(_repo_keys):
761                warn(
762                    f"Invalid connector '{_repo_keys}'.\n"
763                    f"    Skipping required plugins from repository '{_repo_keys}'",
764                    stack=False,
765                )
766                continue
767
768            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
769
770        return plugins

Return a list of required Plugin objects.

def get_required_packages(self, debug: bool = False) -> List[str]:
773    def get_required_packages(self, debug: bool=False) -> List[str]:
774        """
775        Return the required package names (excluding plugins).
776        """
777        _deps = self.get_dependencies(debug=debug)
778        return [_d for _d in _deps if not _d.startswith('plugin:')]

Return the required package names (excluding plugins).

def activate_venv( self, dependencies: bool = True, init_if_not_exists: bool = True, debug: bool = False, **kw) -> bool:
781    def activate_venv(
782        self,
783        dependencies: bool = True,
784        init_if_not_exists: bool = True,
785        debug: bool = False,
786        **kw
787    ) -> bool:
788        """
789        Activate the virtual environments for the plugin and its dependencies.
790
791        Parameters
792        ----------
793        dependencies: bool, default True
794            If `True`, activate the virtual environments for required plugins.
795
796        Returns
797        -------
798        A bool indicating success.
799        """
800        import meerschaum.config.paths as paths
801        from meerschaum.utils.venv import venv_target_path
802        from meerschaum.utils.packages import activate_venv
803        from meerschaum.utils.misc import make_symlink, is_symlink
804
805        if dependencies:
806            for plugin in self.get_required_plugins(debug=debug):
807                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
808
809        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
810        venv_meerschaum_path = vtp / 'meerschaum'
811
812        try:
813            success, msg = True, "Success"
814            if is_symlink(venv_meerschaum_path):
815                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH:
816                    venv_meerschaum_path.unlink()
817                    success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH)
818        except Exception as e:
819            success, msg = False, str(e)
820        if not success:
821            warn(
822                f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n"
823                f"{msg}"
824            )
825
826        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)

Activate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, activate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def deactivate_venv(self, dependencies: bool = True, debug: bool = False, **kw) -> bool:
829    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
830        """
831        Deactivate the virtual environments for the plugin and its dependencies.
832
833        Parameters
834        ----------
835        dependencies: bool, default True
836            If `True`, deactivate the virtual environments for required plugins.
837
838        Returns
839        -------
840        A bool indicating success.
841        """
842        from meerschaum.utils.packages import deactivate_venv
843        success = deactivate_venv(self.name, debug=debug, **kw)
844        if dependencies:
845            for plugin in self.get_required_plugins(debug=debug):
846                plugin.deactivate_venv(debug=debug, **kw)
847        return success

Deactivate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, deactivate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def install_dependencies(self, force: bool = False, debug: bool = False) -> bool:
850    def install_dependencies(
851        self,
852        force: bool = False,
853        debug: bool = False,
854    ) -> bool:
855        """
856        If specified, install dependencies.
857        
858        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
859        Meerschaum plugins from the same repository as this Plugin.
860        To install from a different repository, add the repo keys after `'@'`
861        (e.g. `'plugin:foo@api:bar'`).
862
863        Parameters
864        ----------
865        force: bool, default False
866            If `True`, continue with the installation, even if some
867            required packages fail to install.
868
869        debug: bool, default False
870            Verbosity toggle.
871
872        Returns
873        -------
874        A bool indicating success.
875        """
876        from meerschaum.utils.packages import pip_install, venv_contains_package
877        from meerschaum.utils.warnings import warn, info
878        _deps = self.get_dependencies(debug=debug)
879        if not _deps and self.requirements_file_path is None:
880            return True
881
882        plugins = self.get_required_plugins(debug=debug)
883        for _plugin in plugins:
884            if _plugin.name == self.name:
885                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
886                continue
887            _success, _msg = _plugin.repo_connector.install_plugin(
888                _plugin.name, debug=debug, force=force
889            )
890            if not _success:
891                warn(
892                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
893                    + f" for plugin '{self.name}':\n" + _msg,
894                    stack = False,
895                )
896                if not force:
897                    warn(
898                        "Try installing with the `--force` flag to continue anyway.",
899                        stack = False,
900                    )
901                    return False
902                info(
903                    "Continuing with installation despite the failure "
904                    + "(careful, things might be broken!)...",
905                    icon = False
906                )
907
908
909        ### First step: parse `requirements.txt` if it exists.
910        if self.requirements_file_path is not None:
911            if not pip_install(
912                requirements_file_path=self.requirements_file_path,
913                venv=self.name, debug=debug
914            ):
915                warn(
916                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
917                    stack = False,
918                )
919                if not force:
920                    warn(
921                        "Try installing with `--force` to continue anyway.",
922                        stack = False,
923                    )
924                    return False
925                info(
926                    "Continuing with installation despite the failure "
927                    + "(careful, things might be broken!)...",
928                    icon = False
929                )
930
931
932        ### Don't reinstall packages that are already included in required plugins.
933        packages = []
934        _packages = self.get_required_packages(debug=debug)
935        accounted_for_packages = set()
936        for package_name in _packages:
937            for plugin in plugins:
938                if venv_contains_package(package_name, plugin.name):
939                    accounted_for_packages.add(package_name)
940                    break
941        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
942
943        ### Attempt pip packages installation.
944        if packages:
945            for package in packages:
946                if not pip_install(package, venv=self.name, debug=debug):
947                    warn(
948                        f"Failed to install required package '{package}'"
949                        + f" for plugin '{self.name}'.",
950                        stack = False,
951                    )
952                    if not force:
953                        warn(
954                            "Try installing with `--force` to continue anyway.",
955                            stack = False,
956                        )
957                        return False
958                    info(
959                        "Continuing with installation despite the failure "
960                        + "(careful, things might be broken!)...",
961                        icon = False
962                    )
963        return True

If specified, install dependencies.

NOTE: Dependencies that start with 'plugin:' will be installed as Meerschaum plugins from the same repository as this Plugin. To install from a different repository, add the repo keys after '@' (e.g. 'plugin:foo@api:bar').

Parameters
  • force (bool, default False): If True, continue with the installation, even if some required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool indicating success.
full_name: str
966    @property
967    def full_name(self) -> str:
968        """
969        Include the repo keys with the plugin's name.
970        """
971        from meerschaum._internal.static import STATIC_CONFIG
972        sep = STATIC_CONFIG['plugins']['repo_separator']
973        return self.name + sep + str(self.repo_connector)

Include the repo keys with the plugin's name.

SuccessTuple = typing.Tuple[bool, str]
class Venv:
 19class Venv:
 20    """
 21    Manage a virtual enviroment's activation status.
 22
 23    Examples
 24    --------
 25    >>> from meerschaum.plugins import Plugin
 26    >>> with Venv('mrsm') as venv:
 27    ...     import pandas
 28    >>> with Venv(Plugin('noaa')) as venv:
 29    ...     import requests
 30    >>> venv = Venv('mrsm')
 31    >>> venv.activate()
 32    True
 33    >>> venv.deactivate()
 34    True
 35    >>> 
 36    """
 37
 38    def __init__(
 39        self,
 40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
 41        init_if_not_exists: bool = True,
 42        debug: bool = False,
 43    ) -> None:
 44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
 45        ### For some weird threading issue,
 46        ### we can't use `isinstance` here.
 47        if '_Plugin' in str(type(venv)):
 48            self._venv = venv.name
 49            self._activate = venv.activate_venv
 50            self._deactivate = venv.deactivate_venv
 51            self._kwargs = {}
 52        else:
 53            self._venv = venv
 54            self._activate = activate_venv
 55            self._deactivate = deactivate_venv
 56            self._kwargs = {'venv': venv}
 57        self._debug = debug
 58        self._init_if_not_exists = init_if_not_exists
 59        ### In case someone calls `deactivate()` before `activate()`.
 60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 61
 62
 63    def activate(self, debug: bool = False) -> bool:
 64        """
 65        Activate this virtual environment.
 66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 67        will also be activated.
 68        """
 69        from meerschaum.utils.venv import active_venvs, init_venv
 70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 71        try:
 72            return self._activate(
 73                debug=(debug or self._debug),
 74                init_if_not_exists=self._init_if_not_exists,
 75                **self._kwargs
 76            )
 77        except OSError as e:
 78            if self._init_if_not_exists:
 79                if not init_venv(self._venv, force=True):
 80                    raise e
 81        return self._activate(
 82            debug=(debug or self._debug),
 83            init_if_not_exists=self._init_if_not_exists,
 84            **self._kwargs
 85        )
 86
 87
 88    def deactivate(self, debug: bool = False) -> bool:
 89        """
 90        Deactivate this virtual environment.
 91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 92        will also be deactivated.
 93        """
 94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)
 95
 96
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
106
107
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        import meerschaum.config.paths as paths
114        if self._venv is None:
115            return self.target_path.parent
116        return paths.VIRTENV_RESOURCES_PATH / self._venv
117
118
119    def __enter__(self) -> None:
120        self.activate(debug=self._debug)
121
122
123    def __exit__(self, exc_type, exc_value, exc_traceback) -> None:
124        self.deactivate(debug=self._debug)
125
126
127    def __str__(self) -> str:
128        quote = "'" if self._venv is not None else ""
129        return "Venv(" + quote + str(self._venv) + quote + ")"
130
131
132    def __repr__(self) -> str:
133        return self.__str__()

Manage a virtual enviroment's activation status.

Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
...     import pandas
>>> with Venv(Plugin('noaa')) as venv:
...     import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
Venv( venv: Union[str, Plugin, NoneType] = 'mrsm', init_if_not_exists: bool = True, debug: bool = False)
38    def __init__(
39        self,
40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
41        init_if_not_exists: bool = True,
42        debug: bool = False,
43    ) -> None:
44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
45        ### For some weird threading issue,
46        ### we can't use `isinstance` here.
47        if '_Plugin' in str(type(venv)):
48            self._venv = venv.name
49            self._activate = venv.activate_venv
50            self._deactivate = venv.deactivate_venv
51            self._kwargs = {}
52        else:
53            self._venv = venv
54            self._activate = activate_venv
55            self._deactivate = deactivate_venv
56            self._kwargs = {'venv': venv}
57        self._debug = debug
58        self._init_if_not_exists = init_if_not_exists
59        ### In case someone calls `deactivate()` before `activate()`.
60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
def activate(self, debug: bool = False) -> bool:
63    def activate(self, debug: bool = False) -> bool:
64        """
65        Activate this virtual environment.
66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
67        will also be activated.
68        """
69        from meerschaum.utils.venv import active_venvs, init_venv
70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
71        try:
72            return self._activate(
73                debug=(debug or self._debug),
74                init_if_not_exists=self._init_if_not_exists,
75                **self._kwargs
76            )
77        except OSError as e:
78            if self._init_if_not_exists:
79                if not init_venv(self._venv, force=True):
80                    raise e
81        return self._activate(
82            debug=(debug or self._debug),
83            init_if_not_exists=self._init_if_not_exists,
84            **self._kwargs
85        )

Activate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be activated.

def deactivate(self, debug: bool = False) -> bool:
88    def deactivate(self, debug: bool = False) -> bool:
89        """
90        Deactivate this virtual environment.
91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
92        will also be deactivated.
93        """
94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)

Deactivate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be deactivated.

target_path: pathlib.Path
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)

Return the target site-packages path for this virtual environment. A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version (e.g. Python 3.10 and Python 3.7).

root_path: pathlib.Path
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        import meerschaum.config.paths as paths
114        if self._venv is None:
115            return self.target_path.parent
116        return paths.VIRTENV_RESOURCES_PATH / self._venv

Return the top-level path for this virtual environment.

class Job:
  70class Job:
  71    """
  72    Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API.
  73    """
  74
  75    def __init__(
  76        self,
  77        name: str,
  78        sysargs: Union[List[str], str, None] = None,
  79        env: Optional[Dict[str, str]] = None,
  80        executor_keys: Optional[str] = None,
  81        delete_after_completion: bool = False,
  82        refresh_seconds: Union[int, float, None] = None,
  83        _properties: Optional[Dict[str, Any]] = None,
  84        _rotating_log=None,
  85        _stdin_file=None,
  86        _status_hook: Optional[Callable[[], str]] = None,
  87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
  88        _externally_managed: bool = False,
  89    ):
  90        """
  91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
  92
  93        Parameters
  94        ----------
  95        name: str
  96            The name of the job to be created.
  97            This will also be used as the Daemon ID.
  98
  99        sysargs: Union[List[str], str, None], default None
 100            The sysargs of the command to be executed, e.g. 'start api'.
 101
 102        env: Optional[Dict[str, str]], default None
 103            If provided, set these environment variables in the job's process.
 104
 105        executor_keys: Optional[str], default None
 106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
 107
 108        delete_after_completion: bool, default False
 109            If `True`, delete this job when it has finished executing.
 110
 111        refresh_seconds: Union[int, float, None], default None
 112            The number of seconds to sleep between refreshes.
 113            Defaults to the configured value `system.cli.refresh_seconds`.
 114
 115        _properties: Optional[Dict[str, Any]], default None
 116            If provided, use this to patch the daemon's properties.
 117        """
 118        from meerschaum.utils.daemon import Daemon
 119        for char in BANNED_CHARS:
 120            if char in name:
 121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
 122
 123        if isinstance(sysargs, str):
 124            sysargs = shlex.split(sysargs)
 125
 126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
 127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
 128        if sysargs:
 129            sysargs = [
 130                (arg if arg != escaped_and_key else and_key)
 131                for arg in sysargs
 132            ]
 133
 134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
 135        if executor_keys is None:
 136            from meerschaum.jobs import get_executor_keys_from_context
 137            executor_keys = get_executor_keys_from_context()
 138
 139        self.executor_keys = executor_keys
 140        self.name = name
 141        self.refresh_seconds = (
 142            refresh_seconds
 143            if refresh_seconds is not None
 144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
 145        )
 146        try:
 147            self._daemon = (
 148                Daemon(daemon_id=name)
 149                if executor_keys == 'local'
 150                else None
 151            )
 152        except Exception:
 153            self._daemon = None
 154
 155        ### Handle any injected dependencies.
 156        if _rotating_log is not None:
 157            self._rotating_log = _rotating_log
 158            if self._daemon is not None:
 159                self._daemon._rotating_log = _rotating_log
 160
 161        if _stdin_file is not None:
 162            self._stdin_file = _stdin_file
 163            if self._daemon is not None:
 164                self._daemon._stdin_file = _stdin_file
 165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
 166
 167        if _status_hook is not None:
 168            self._status_hook = _status_hook
 169
 170        if _result_hook is not None:
 171            self._result_hook = _result_hook
 172
 173        self._externally_managed = _externally_managed
 174        self._properties_patch = _properties or {}
 175        if _externally_managed:
 176            self._properties_patch.update({'externally_managed': _externally_managed})
 177
 178        if env:
 179            self._properties_patch.update({'env': env})
 180
 181        if delete_after_completion:
 182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
 183
 184        daemon_sysargs = (
 185            self._daemon.properties.get('target', {}).get('args', [None])[0]
 186            if self._daemon is not None
 187            else None
 188        )
 189
 190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
 191            warn("Given sysargs differ from existing sysargs.")
 192
 193        self._sysargs = [
 194            arg
 195            for arg in (daemon_sysargs or sysargs or [])
 196            if arg not in ('-d', '--daemon')
 197        ]
 198        for restart_flag in RESTART_FLAGS:
 199            if restart_flag in self._sysargs:
 200                self._properties_patch.update({'restart': True})
 201                break
 202
 203    @staticmethod
 204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
 205        """
 206        Build a `Job` from the PID of a running Meerschaum process.
 207
 208        Parameters
 209        ----------
 210        pid: int
 211            The PID of the process.
 212
 213        executor_keys: Optional[str], default None
 214            The executor keys to assign to the job.
 215        """
 216        psutil = mrsm.attempt_import('psutil')
 217        try:
 218            process = psutil.Process(pid)
 219        except psutil.NoSuchProcess as e:
 220            warn(f"Process with PID {pid} does not exist.", stack=False)
 221            raise e
 222
 223        command_args = process.cmdline()
 224        is_daemon = command_args[1] == '-c'
 225
 226        if is_daemon:
 227            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
 228            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
 229            if root_dir is None:
 230                root_dir = paths.ROOT_DIR_PATH
 231            else:
 232                root_dir = pathlib.Path(root_dir)
 233            jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name
 234            daemon_dir = jobs_dir / daemon_id
 235            pid_file = daemon_dir / 'process.pid'
 236
 237            if pid_file.exists():
 238                with open(pid_file, 'r', encoding='utf-8') as f:
 239                    daemon_pid = int(f.read())
 240
 241                if pid != daemon_pid:
 242                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
 243            else:
 244                raise EnvironmentError(f"Is job '{daemon_id}' running?")
 245
 246            return Job(daemon_id, executor_keys=executor_keys)
 247
 248        from meerschaum._internal.arguments._parse_arguments import parse_arguments
 249        from meerschaum.utils.daemon import get_new_daemon_name
 250
 251        mrsm_ix = 0
 252        for i, arg in enumerate(command_args):
 253            if 'mrsm' in arg or 'meerschaum' in arg.lower():
 254                mrsm_ix = i
 255                break
 256
 257        sysargs = command_args[mrsm_ix+1:]
 258        kwargs = parse_arguments(sysargs)
 259        name = kwargs.get('name', get_new_daemon_name())
 260        return Job(name, sysargs, executor_keys=executor_keys)
 261
 262    def start(self, debug: bool = False) -> SuccessTuple:
 263        """
 264        Start the job's daemon.
 265        """
 266        if self.executor is not None:
 267            if not self.exists(debug=debug):
 268                return self.executor.create_job(
 269                    self.name,
 270                    self.sysargs,
 271                    properties=self.daemon.properties,
 272                    debug=debug,
 273                )
 274            return self.executor.start_job(self.name, debug=debug)
 275
 276        if self.is_running():
 277            return True, f"{self} is already running."
 278
 279        success, msg = self.daemon.run(
 280            keep_daemon_output=(not self.delete_after_completion),
 281            allow_dirty_run=True,
 282        )
 283        if not success:
 284            return success, msg
 285
 286        return success, f"Started {self}."
 287
 288    def stop(
 289        self,
 290        timeout_seconds: Union[int, float, None] = None,
 291        debug: bool = False,
 292    ) -> SuccessTuple:
 293        """
 294        Stop the job's daemon.
 295        """
 296        if self.executor is not None:
 297            return self.executor.stop_job(self.name, debug=debug)
 298
 299        if self.daemon.status == 'stopped':
 300            if not self.restart:
 301                return True, f"{self} is not running."
 302            elif self.stop_time is not None:
 303                return True, f"{self} will not restart until manually started."
 304
 305        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
 306        if quit_success:
 307            return quit_success, f"Stopped {self}."
 308
 309        warn(
 310            f"Failed to gracefully quit {self}.",
 311            stack=False,
 312        )
 313        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
 314        if not kill_success:
 315            return kill_success, kill_msg
 316
 317        return kill_success, f"Killed {self}."
 318
 319    def pause(
 320        self,
 321        timeout_seconds: Union[int, float, None] = None,
 322        debug: bool = False,
 323    ) -> SuccessTuple:
 324        """
 325        Pause the job's daemon.
 326        """
 327        if self.executor is not None:
 328            return self.executor.pause_job(self.name, debug=debug)
 329
 330        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
 331        if not pause_success:
 332            return pause_success, pause_msg
 333
 334        return pause_success, f"Paused {self}."
 335
 336    def delete(self, debug: bool = False) -> SuccessTuple:
 337        """
 338        Delete the job and its daemon.
 339        """
 340        if self.executor is not None:
 341            return self.executor.delete_job(self.name, debug=debug)
 342
 343        if self.is_running():
 344            stop_success, stop_msg = self.stop()
 345            if not stop_success:
 346                return stop_success, stop_msg
 347
 348        cleanup_success, cleanup_msg = self.daemon.cleanup()
 349        if not cleanup_success:
 350            return cleanup_success, cleanup_msg
 351
 352        _ = self.daemon._properties.pop('result', None)
 353        return cleanup_success, f"Deleted {self}."
 354
 355    def is_running(self) -> bool:
 356        """
 357        Determine whether the job's daemon is running.
 358        """
 359        return self.status == 'running'
 360
 361    def exists(self, debug: bool = False) -> bool:
 362        """
 363        Determine whether the job exists.
 364        """
 365        if self.executor is not None:
 366            return self.executor.get_job_exists(self.name, debug=debug)
 367
 368        return self.daemon.path.exists()
 369
 370    def get_logs(self) -> Union[str, None]:
 371        """
 372        Return the output text of the job's daemon.
 373        """
 374        if self.executor is not None:
 375            return self.executor.get_logs(self.name)
 376
 377        return self.daemon.log_text
 378
 379    def monitor_logs(
 380        self,
 381        callback_function: Callable[[str], None] = _default_stdout_callback,
 382        input_callback_function: Optional[Callable[[], str]] = None,
 383        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 384        stop_event: Optional[asyncio.Event] = None,
 385        stop_on_exit: bool = False,
 386        strip_timestamps: bool = False,
 387        accept_input: bool = True,
 388        debug: bool = False,
 389        _logs_path: Optional[pathlib.Path] = None,
 390        _log=None,
 391        _stdin_file=None,
 392        _wait_if_stopped: bool = True,
 393    ):
 394        """
 395        Monitor the job's log files and execute a callback on new lines.
 396
 397        Parameters
 398        ----------
 399        callback_function: Callable[[str], None], default partial(print, end='')
 400            The callback to execute as new data comes in.
 401            Defaults to printing the output directly to `stdout`.
 402
 403        input_callback_function: Optional[Callable[[], str]], default None
 404            If provided, execute this callback when the daemon is blocking on stdin.
 405            Defaults to `sys.stdin.readline()`.
 406
 407        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 408            If provided, execute this callback when the daemon stops.
 409            The job's SuccessTuple will be passed to the callback.
 410
 411        stop_event: Optional[asyncio.Event], default None
 412            If provided, stop monitoring when this event is set.
 413            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 414            from within `callback_function` to stop monitoring.
 415
 416        stop_on_exit: bool, default False
 417            If `True`, stop monitoring when the job stops.
 418
 419        strip_timestamps: bool, default False
 420            If `True`, remove leading timestamps from lines.
 421
 422        accept_input: bool, default True
 423            If `True`, accept input when the daemon blocks on stdin.
 424        """
 425        if self.executor is not None:
 426            self.executor.monitor_logs(
 427                self.name,
 428                callback_function,
 429                input_callback_function=input_callback_function,
 430                stop_callback_function=stop_callback_function,
 431                stop_on_exit=stop_on_exit,
 432                accept_input=accept_input,
 433                strip_timestamps=strip_timestamps,
 434                debug=debug,
 435            )
 436            return
 437
 438        monitor_logs_coroutine = self.monitor_logs_async(
 439            callback_function=callback_function,
 440            input_callback_function=input_callback_function,
 441            stop_callback_function=stop_callback_function,
 442            stop_event=stop_event,
 443            stop_on_exit=stop_on_exit,
 444            strip_timestamps=strip_timestamps,
 445            accept_input=accept_input,
 446            debug=debug,
 447            _logs_path=_logs_path,
 448            _log=_log,
 449            _stdin_file=_stdin_file,
 450            _wait_if_stopped=_wait_if_stopped,
 451        )
 452        return asyncio.run(monitor_logs_coroutine)
 453
 454    async def monitor_logs_async(
 455        self,
 456        callback_function: Callable[[str], None] = _default_stdout_callback,
 457        input_callback_function: Optional[Callable[[], str]] = None,
 458        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 459        stop_event: Optional[asyncio.Event] = None,
 460        stop_on_exit: bool = False,
 461        strip_timestamps: bool = False,
 462        accept_input: bool = True,
 463        debug: bool = False,
 464        _logs_path: Optional[pathlib.Path] = None,
 465        _log=None,
 466        _stdin_file=None,
 467        _wait_if_stopped: bool = True,
 468    ):
 469        """
 470        Monitor the job's log files and await a callback on new lines.
 471
 472        Parameters
 473        ----------
 474        callback_function: Callable[[str], None], default _default_stdout_callback
 475            The callback to execute as new data comes in.
 476            Defaults to printing the output directly to `stdout`.
 477
 478        input_callback_function: Optional[Callable[[], str]], default None
 479            If provided, execute this callback when the daemon is blocking on stdin.
 480            Defaults to `sys.stdin.readline()`.
 481
 482        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 483            If provided, execute this callback when the daemon stops.
 484            The job's SuccessTuple will be passed to the callback.
 485
 486        stop_event: Optional[asyncio.Event], default None
 487            If provided, stop monitoring when this event is set.
 488            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 489            from within `callback_function` to stop monitoring.
 490
 491        stop_on_exit: bool, default False
 492            If `True`, stop monitoring when the job stops.
 493
 494        strip_timestamps: bool, default False
 495            If `True`, remove leading timestamps from lines.
 496
 497        accept_input: bool, default True
 498            If `True`, accept input when the daemon blocks on stdin.
 499        """
 500        from meerschaum.utils.prompt import prompt
 501
 502        def default_input_callback_function():
 503            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
 504            if prompt_kwargs:
 505                answer = prompt(**prompt_kwargs)
 506                return answer + '\n'
 507            return sys.stdin.readline()
 508
 509        if input_callback_function is None:
 510            input_callback_function = default_input_callback_function
 511
 512        if self.executor is not None:
 513            await self.executor.monitor_logs_async(
 514                self.name,
 515                callback_function,
 516                input_callback_function=input_callback_function,
 517                stop_callback_function=stop_callback_function,
 518                stop_on_exit=stop_on_exit,
 519                strip_timestamps=strip_timestamps,
 520                accept_input=accept_input,
 521                debug=debug,
 522            )
 523            return
 524
 525        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
 526
 527        events = {
 528            'user': stop_event,
 529            'stopped': asyncio.Event(),
 530            'stop_token': asyncio.Event(),
 531            'stop_exception': asyncio.Event(),
 532            'stopped_timeout': asyncio.Event(),
 533        }
 534        combined_event = asyncio.Event()
 535        emitted_text = False
 536        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
 537
 538        async def check_job_status():
 539            if not stop_on_exit:
 540                return
 541
 542            nonlocal emitted_text
 543
 544            sleep_time = 0.1
 545            while sleep_time < 0.2:
 546                if self.status == 'stopped':
 547                    if not emitted_text and _wait_if_stopped:
 548                        await asyncio.sleep(sleep_time)
 549                        sleep_time = round(sleep_time * 1.1, 3)
 550                        continue
 551
 552                    if stop_callback_function is not None:
 553                        try:
 554                            if asyncio.iscoroutinefunction(stop_callback_function):
 555                                await stop_callback_function(self.result)
 556                            else:
 557                                stop_callback_function(self.result)
 558                        except asyncio.exceptions.CancelledError:
 559                            break
 560                        except Exception:
 561                            warn(traceback.format_exc())
 562
 563                    if stop_on_exit:
 564                        events['stopped'].set()
 565
 566                    break
 567                await asyncio.sleep(0.1)
 568
 569            events['stopped_timeout'].set()
 570
 571        async def check_blocking_on_input():
 572            while True:
 573                if not emitted_text or not self.is_blocking_on_stdin():
 574                    try:
 575                        await asyncio.sleep(self.refresh_seconds)
 576                    except asyncio.exceptions.CancelledError:
 577                        break
 578                    continue
 579
 580                if not self.is_running():
 581                    break
 582
 583                await emit_latest_lines()
 584
 585                try:
 586                    print('', end='', flush=True)
 587                    if asyncio.iscoroutinefunction(input_callback_function):
 588                        data = await input_callback_function()
 589                    else:
 590                        loop = asyncio.get_running_loop()
 591                        data = await loop.run_in_executor(None, input_callback_function)
 592                except KeyboardInterrupt:
 593                    break
 594                #  if not data.endswith('\n'):
 595                    #  data += '\n'
 596
 597                stdin_file.write(data)
 598                await asyncio.sleep(self.refresh_seconds)
 599
 600        async def combine_events():
 601            event_tasks = [
 602                asyncio.create_task(event.wait())
 603                for event in events.values()
 604                if event is not None
 605            ]
 606            if not event_tasks:
 607                return
 608
 609            try:
 610                done, pending = await asyncio.wait(
 611                    event_tasks,
 612                    return_when=asyncio.FIRST_COMPLETED,
 613                )
 614                for task in pending:
 615                    task.cancel()
 616            except asyncio.exceptions.CancelledError:
 617                pass
 618            finally:
 619                combined_event.set()
 620
 621        check_job_status_task = asyncio.create_task(check_job_status())
 622        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
 623        combine_events_task = asyncio.create_task(combine_events())
 624
 625        log = _log if _log is not None else self.daemon.rotating_log
 626        lines_to_show = (
 627            self.daemon.properties.get(
 628                'logs', {}
 629            ).get(
 630                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
 631            )
 632        )
 633
 634        async def emit_latest_lines():
 635            nonlocal emitted_text
 636            nonlocal stop_event
 637            lines = log.readlines()
 638            for line in lines[(-1 * lines_to_show):]:
 639                if stop_event is not None and stop_event.is_set():
 640                    return
 641
 642                line_stripped_extra = strip_timestamp_from_line(line.strip())
 643                line_stripped = strip_timestamp_from_line(line)
 644
 645                if line_stripped_extra == STOP_TOKEN:
 646                    events['stop_token'].set()
 647                    return
 648
 649                if line_stripped_extra == CLEAR_TOKEN:
 650                    clear_screen(debug=debug)
 651                    continue
 652
 653                if line_stripped_extra == FLUSH_TOKEN.strip():
 654                    line_stripped = ''
 655                    line = ''
 656
 657                if strip_timestamps:
 658                    line = line_stripped
 659
 660                try:
 661                    if asyncio.iscoroutinefunction(callback_function):
 662                        await callback_function(line)
 663                    else:
 664                        callback_function(line)
 665                    emitted_text = True
 666                except StopMonitoringLogs:
 667                    events['stop_exception'].set()
 668                    return
 669                except Exception:
 670                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
 671
 672        await emit_latest_lines()
 673
 674        tasks = (
 675            [check_job_status_task]
 676            + ([check_blocking_on_input_task] if accept_input else [])
 677            + [combine_events_task]
 678        )
 679        try:
 680            _ = asyncio.gather(*tasks, return_exceptions=True)
 681        except asyncio.exceptions.CancelledError:
 682            raise
 683        except Exception:
 684            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
 685
 686        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
 687        dir_path_to_monitor = (
 688            _logs_path
 689            or (log.file_path.parent if log else None)
 690            or paths.LOGS_RESOURCES_PATH
 691        )
 692        async for changes in watchfiles.awatch(
 693            dir_path_to_monitor,
 694            stop_event=combined_event,
 695        ):
 696            for change in changes:
 697                file_path_str = change[1]
 698                file_path = pathlib.Path(file_path_str)
 699                latest_subfile_path = log.get_latest_subfile_path()
 700                if latest_subfile_path != file_path:
 701                    continue
 702
 703                await emit_latest_lines()
 704
 705        await emit_latest_lines()
 706
 707    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
 708        """
 709        Return whether a job's daemon is blocking on stdin.
 710        """
 711        if self.executor is not None:
 712            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
 713
 714        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
 715
 716    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
 717        """
 718        Return the kwargs to the blocking `prompt()`, if available.
 719        """
 720        if self.executor is not None:
 721            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
 722
 723        if not self.daemon.prompt_kwargs_file_path.exists():
 724            return {}
 725
 726        try:
 727            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
 728                prompt_kwargs = json.load(f)
 729
 730            return prompt_kwargs
 731        
 732        except Exception:
 733            import traceback
 734            traceback.print_exc()
 735            return {}
 736
 737    def write_stdin(self, data):
 738        """
 739        Write to a job's daemon's `stdin`.
 740        """
 741        self.daemon.stdin_file.write(data)
 742
 743    @property
 744    def executor(self) -> Union[Executor, None]:
 745        """
 746        If the job is remote, return the connector to the remote API instance.
 747        """
 748        return (
 749            mrsm.get_connector(self.executor_keys)
 750            if self.executor_keys != 'local'
 751            else None
 752        )
 753
 754    @property
 755    def status(self) -> str:
 756        """
 757        Return the running status of the job's daemon.
 758        """
 759        if '_status_hook' in self.__dict__:
 760            return self._status_hook()
 761
 762        if self.executor is not None:
 763            return self.executor.get_job_status(self.name)
 764
 765        return self.daemon.status
 766
 767    @property
 768    def pid(self) -> Union[int, None]:
 769        """
 770        Return the PID of the job's dameon.
 771        """
 772        if self.executor is not None:
 773            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
 774
 775        return self.daemon.pid
 776
 777    @property
 778    def restart(self) -> bool:
 779        """
 780        Return whether to restart a stopped job.
 781        """
 782        if self.executor is not None:
 783            return self.executor.get_job_metadata(self.name).get('restart', False)
 784
 785        return self.daemon.properties.get('restart', False)
 786
 787    @property
 788    def result(self) -> SuccessTuple:
 789        """
 790        Return the `SuccessTuple` when the job has terminated.
 791        """
 792        if self.is_running():
 793            return True, f"{self} is running."
 794
 795        if '_result_hook' in self.__dict__:
 796            return self._result_hook()
 797
 798        if self.executor is not None:
 799            return (
 800                self.executor.get_job_metadata(self.name)
 801                .get('result', (False, "No result available."))
 802            )
 803
 804        _result = self.daemon.properties.get('result', None)
 805        if _result is None:
 806            from meerschaum.utils.daemon.Daemon import _results
 807            return _results.get(self.daemon.daemon_id, (False, "No result available."))
 808
 809        return tuple(_result)
 810
 811    @property
 812    def sysargs(self) -> List[str]:
 813        """
 814        Return the sysargs to use for the Daemon.
 815        """
 816        if self._sysargs:
 817            return self._sysargs
 818
 819        if self.executor is not None:
 820            return self.executor.get_job_metadata(self.name).get('sysargs', [])
 821
 822        target_args = self.daemon.target_args
 823        if target_args is None:
 824            return []
 825        self._sysargs = target_args[0] if len(target_args) > 0 else []
 826        return self._sysargs
 827
 828    def get_daemon_properties(self) -> Dict[str, Any]:
 829        """
 830        Return the `properties` dictionary for the job's daemon.
 831        """
 832        remote_properties = (
 833            {}
 834            if self.executor is None
 835            else self.executor.get_job_properties(self.name)
 836        )
 837        return {
 838            **remote_properties,
 839            **self._properties_patch
 840        }
 841
 842    @property
 843    def daemon(self) -> 'Daemon':
 844        """
 845        Return the daemon which this job manages.
 846        """
 847        from meerschaum.utils.daemon import Daemon
 848        if self._daemon is not None and self.executor is None and self._sysargs:
 849            return self._daemon
 850
 851        self._daemon = Daemon(
 852            target=entry,
 853            target_args=[self._sysargs],
 854            target_kw={},
 855            daemon_id=self.name,
 856            label=shlex.join(self._sysargs),
 857            properties=self.get_daemon_properties(),
 858        )
 859        if '_rotating_log' in self.__dict__:
 860            self._daemon._rotating_log = self._rotating_log
 861
 862        if '_stdin_file' in self.__dict__:
 863            self._daemon._stdin_file = self._stdin_file
 864            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
 865
 866        return self._daemon
 867
 868    @property
 869    def began(self) -> Union[datetime, None]:
 870        """
 871        The datetime when the job began running.
 872        """
 873        if self.executor is not None:
 874            began_str = self.executor.get_job_began(self.name)
 875            if began_str is None:
 876                return None
 877            return (
 878                datetime.fromisoformat(began_str)
 879                .astimezone(timezone.utc)
 880                .replace(tzinfo=None)
 881            )
 882
 883        began_str = self.daemon.properties.get('process', {}).get('began', None)
 884        if began_str is None:
 885            return None
 886
 887        return datetime.fromisoformat(began_str)
 888
 889    @property
 890    def ended(self) -> Union[datetime, None]:
 891        """
 892        The datetime when the job stopped running.
 893        """
 894        if self.executor is not None:
 895            ended_str = self.executor.get_job_ended(self.name)
 896            if ended_str is None:
 897                return None
 898            return (
 899                datetime.fromisoformat(ended_str)
 900                .astimezone(timezone.utc)
 901                .replace(tzinfo=None)
 902            )
 903
 904        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
 905        if ended_str is None:
 906            return None
 907
 908        return datetime.fromisoformat(ended_str)
 909
 910    @property
 911    def paused(self) -> Union[datetime, None]:
 912        """
 913        The datetime when the job was suspended while running.
 914        """
 915        if self.executor is not None:
 916            paused_str = self.executor.get_job_paused(self.name)
 917            if paused_str is None:
 918                return None
 919            return (
 920                datetime.fromisoformat(paused_str)
 921                .astimezone(timezone.utc)
 922                .replace(tzinfo=None)
 923            )
 924
 925        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
 926        if paused_str is None:
 927            return None
 928
 929        return datetime.fromisoformat(paused_str)
 930
 931    @property
 932    def stop_time(self) -> Union[datetime, None]:
 933        """
 934        Return the timestamp when the job was manually stopped.
 935        """
 936        if self.executor is not None:
 937            return self.executor.get_job_stop_time(self.name)
 938
 939        if not self.daemon.stop_path.exists():
 940            return None
 941
 942        stop_data = self.daemon._read_stop_file()
 943        if not stop_data:
 944            return None
 945
 946        stop_time_str = stop_data.get('stop_time', None)
 947        if not stop_time_str:
 948            warn(f"Could not read stop time for {self}.")
 949            return None
 950
 951        return datetime.fromisoformat(stop_time_str)
 952
 953    @property
 954    def hidden(self) -> bool:
 955        """
 956        Return a bool indicating whether this job should be displayed.
 957        """
 958        return (
 959            self.name.startswith('_')
 960            or self.name.startswith('.')
 961            or self._is_externally_managed
 962        )
 963
 964    def check_restart(self) -> SuccessTuple:
 965        """
 966        If `restart` is `True` and the daemon is not running,
 967        restart the job.
 968        Do not restart if the job was manually stopped.
 969        """
 970        if self.is_running():
 971            return True, f"{self} is running."
 972
 973        if not self.restart:
 974            return True, f"{self} does not need to be restarted."
 975
 976        if self.stop_time is not None:
 977            return True, f"{self} was manually stopped."
 978
 979        return self.start()
 980
 981    @property
 982    def label(self) -> str:
 983        """
 984        Return the job's Daemon label (joined sysargs).
 985        """
 986        from meerschaum._internal.arguments import compress_pipeline_sysargs
 987        sysargs = compress_pipeline_sysargs(self.sysargs)
 988        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
 989
 990    @property
 991    def _externally_managed_file(self) -> pathlib.Path:
 992        """
 993        Return the path to the externally managed file.
 994        """
 995        return self.daemon.path / '.externally-managed'
 996
 997    def _set_externally_managed(self):
 998        """
 999        Set this job as externally managed.
1000        """
1001        self._externally_managed = True
1002        try:
1003            self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True)
1004            self._externally_managed_file.touch()
1005        except Exception as e:
1006            warn(e)
1007
1008    @property
1009    def _is_externally_managed(self) -> bool:
1010        """
1011        Return whether this job is externally managed.
1012        """
1013        return self.executor_keys in (None, 'local') and (
1014            self._externally_managed or self._externally_managed_file.exists()
1015        )
1016
1017    @property
1018    def env(self) -> Dict[str, str]:
1019        """
1020        Return the environment variables to set for the job's process.
1021        """
1022        if '_env' in self.__dict__:
1023            return self.__dict__['_env']
1024
1025        _env = self.daemon.properties.get('env', {})
1026        default_env = {
1027            'PYTHONUNBUFFERED': '1',
1028            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1029            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1030            STATIC_CONFIG['environment']['noninteractive']: 'true',
1031        }
1032        self._env = {**default_env, **_env}
1033        return self._env
1034
1035    @property
1036    def delete_after_completion(self) -> bool:
1037        """
1038        Return whether this job is configured to delete itself after completion.
1039        """
1040        if '_delete_after_completion' in self.__dict__:
1041            return self.__dict__.get('_delete_after_completion', False)
1042
1043        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1044        return self._delete_after_completion
1045
1046    def __str__(self) -> str:
1047        sysargs = self.sysargs
1048        sysargs_str = shlex.join(sysargs) if sysargs else ''
1049        job_str = f'Job("{self.name}"'
1050        if sysargs_str:
1051            job_str += f', "{sysargs_str}"'
1052
1053        job_str += ')'
1054        return job_str
1055
1056    def __repr__(self) -> str:
1057        return str(self)
1058
1059    def __hash__(self) -> int:
1060        return hash(self.name)

Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.

Job( name: str, sysargs: Union[List[str], str, NoneType] = None, env: Optional[Dict[str, str]] = None, executor_keys: Optional[str] = None, delete_after_completion: bool = False, refresh_seconds: Union[int, float, NoneType] = None, _properties: Optional[Dict[str, Any]] = None, _rotating_log=None, _stdin_file=None, _status_hook: Optional[Callable[[], str]] = None, _result_hook: Optional[Callable[[], Tuple[bool, str]]] = None, _externally_managed: bool = False)
 75    def __init__(
 76        self,
 77        name: str,
 78        sysargs: Union[List[str], str, None] = None,
 79        env: Optional[Dict[str, str]] = None,
 80        executor_keys: Optional[str] = None,
 81        delete_after_completion: bool = False,
 82        refresh_seconds: Union[int, float, None] = None,
 83        _properties: Optional[Dict[str, Any]] = None,
 84        _rotating_log=None,
 85        _stdin_file=None,
 86        _status_hook: Optional[Callable[[], str]] = None,
 87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
 88        _externally_managed: bool = False,
 89    ):
 90        """
 91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
 92
 93        Parameters
 94        ----------
 95        name: str
 96            The name of the job to be created.
 97            This will also be used as the Daemon ID.
 98
 99        sysargs: Union[List[str], str, None], default None
100            The sysargs of the command to be executed, e.g. 'start api'.
101
102        env: Optional[Dict[str, str]], default None
103            If provided, set these environment variables in the job's process.
104
105        executor_keys: Optional[str], default None
106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
107
108        delete_after_completion: bool, default False
109            If `True`, delete this job when it has finished executing.
110
111        refresh_seconds: Union[int, float, None], default None
112            The number of seconds to sleep between refreshes.
113            Defaults to the configured value `system.cli.refresh_seconds`.
114
115        _properties: Optional[Dict[str, Any]], default None
116            If provided, use this to patch the daemon's properties.
117        """
118        from meerschaum.utils.daemon import Daemon
119        for char in BANNED_CHARS:
120            if char in name:
121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
122
123        if isinstance(sysargs, str):
124            sysargs = shlex.split(sysargs)
125
126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
128        if sysargs:
129            sysargs = [
130                (arg if arg != escaped_and_key else and_key)
131                for arg in sysargs
132            ]
133
134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
135        if executor_keys is None:
136            from meerschaum.jobs import get_executor_keys_from_context
137            executor_keys = get_executor_keys_from_context()
138
139        self.executor_keys = executor_keys
140        self.name = name
141        self.refresh_seconds = (
142            refresh_seconds
143            if refresh_seconds is not None
144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
145        )
146        try:
147            self._daemon = (
148                Daemon(daemon_id=name)
149                if executor_keys == 'local'
150                else None
151            )
152        except Exception:
153            self._daemon = None
154
155        ### Handle any injected dependencies.
156        if _rotating_log is not None:
157            self._rotating_log = _rotating_log
158            if self._daemon is not None:
159                self._daemon._rotating_log = _rotating_log
160
161        if _stdin_file is not None:
162            self._stdin_file = _stdin_file
163            if self._daemon is not None:
164                self._daemon._stdin_file = _stdin_file
165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
166
167        if _status_hook is not None:
168            self._status_hook = _status_hook
169
170        if _result_hook is not None:
171            self._result_hook = _result_hook
172
173        self._externally_managed = _externally_managed
174        self._properties_patch = _properties or {}
175        if _externally_managed:
176            self._properties_patch.update({'externally_managed': _externally_managed})
177
178        if env:
179            self._properties_patch.update({'env': env})
180
181        if delete_after_completion:
182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
183
184        daemon_sysargs = (
185            self._daemon.properties.get('target', {}).get('args', [None])[0]
186            if self._daemon is not None
187            else None
188        )
189
190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
191            warn("Given sysargs differ from existing sysargs.")
192
193        self._sysargs = [
194            arg
195            for arg in (daemon_sysargs or sysargs or [])
196            if arg not in ('-d', '--daemon')
197        ]
198        for restart_flag in RESTART_FLAGS:
199            if restart_flag in self._sysargs:
200                self._properties_patch.update({'restart': True})
201                break

Create a new job to manage a meerschaum.utils.daemon.Daemon.

Parameters
  • name (str): The name of the job to be created. This will also be used as the Daemon ID.
  • sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
  • env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
  • executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
  • delete_after_completion (bool, default False): If True, delete this job when it has finished executing.
  • refresh_seconds (Union[int, float, None], default None): The number of seconds to sleep between refreshes. Defaults to the configured value system.cli.refresh_seconds.
  • _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
executor_keys
name
refresh_seconds
@staticmethod
def from_pid( pid: int, executor_keys: Optional[str] = None) -> Job:
203    @staticmethod
204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
205        """
206        Build a `Job` from the PID of a running Meerschaum process.
207
208        Parameters
209        ----------
210        pid: int
211            The PID of the process.
212
213        executor_keys: Optional[str], default None
214            The executor keys to assign to the job.
215        """
216        psutil = mrsm.attempt_import('psutil')
217        try:
218            process = psutil.Process(pid)
219        except psutil.NoSuchProcess as e:
220            warn(f"Process with PID {pid} does not exist.", stack=False)
221            raise e
222
223        command_args = process.cmdline()
224        is_daemon = command_args[1] == '-c'
225
226        if is_daemon:
227            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
228            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
229            if root_dir is None:
230                root_dir = paths.ROOT_DIR_PATH
231            else:
232                root_dir = pathlib.Path(root_dir)
233            jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name
234            daemon_dir = jobs_dir / daemon_id
235            pid_file = daemon_dir / 'process.pid'
236
237            if pid_file.exists():
238                with open(pid_file, 'r', encoding='utf-8') as f:
239                    daemon_pid = int(f.read())
240
241                if pid != daemon_pid:
242                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
243            else:
244                raise EnvironmentError(f"Is job '{daemon_id}' running?")
245
246            return Job(daemon_id, executor_keys=executor_keys)
247
248        from meerschaum._internal.arguments._parse_arguments import parse_arguments
249        from meerschaum.utils.daemon import get_new_daemon_name
250
251        mrsm_ix = 0
252        for i, arg in enumerate(command_args):
253            if 'mrsm' in arg or 'meerschaum' in arg.lower():
254                mrsm_ix = i
255                break
256
257        sysargs = command_args[mrsm_ix+1:]
258        kwargs = parse_arguments(sysargs)
259        name = kwargs.get('name', get_new_daemon_name())
260        return Job(name, sysargs, executor_keys=executor_keys)

Build a Job from the PID of a running Meerschaum process.

Parameters
  • pid (int): The PID of the process.
  • executor_keys (Optional[str], default None): The executor keys to assign to the job.
def start(self, debug: bool = False) -> Tuple[bool, str]:
262    def start(self, debug: bool = False) -> SuccessTuple:
263        """
264        Start the job's daemon.
265        """
266        if self.executor is not None:
267            if not self.exists(debug=debug):
268                return self.executor.create_job(
269                    self.name,
270                    self.sysargs,
271                    properties=self.daemon.properties,
272                    debug=debug,
273                )
274            return self.executor.start_job(self.name, debug=debug)
275
276        if self.is_running():
277            return True, f"{self} is already running."
278
279        success, msg = self.daemon.run(
280            keep_daemon_output=(not self.delete_after_completion),
281            allow_dirty_run=True,
282        )
283        if not success:
284            return success, msg
285
286        return success, f"Started {self}."

Start the job's daemon.

def stop( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
288    def stop(
289        self,
290        timeout_seconds: Union[int, float, None] = None,
291        debug: bool = False,
292    ) -> SuccessTuple:
293        """
294        Stop the job's daemon.
295        """
296        if self.executor is not None:
297            return self.executor.stop_job(self.name, debug=debug)
298
299        if self.daemon.status == 'stopped':
300            if not self.restart:
301                return True, f"{self} is not running."
302            elif self.stop_time is not None:
303                return True, f"{self} will not restart until manually started."
304
305        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
306        if quit_success:
307            return quit_success, f"Stopped {self}."
308
309        warn(
310            f"Failed to gracefully quit {self}.",
311            stack=False,
312        )
313        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
314        if not kill_success:
315            return kill_success, kill_msg
316
317        return kill_success, f"Killed {self}."

Stop the job's daemon.

def pause( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
319    def pause(
320        self,
321        timeout_seconds: Union[int, float, None] = None,
322        debug: bool = False,
323    ) -> SuccessTuple:
324        """
325        Pause the job's daemon.
326        """
327        if self.executor is not None:
328            return self.executor.pause_job(self.name, debug=debug)
329
330        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
331        if not pause_success:
332            return pause_success, pause_msg
333
334        return pause_success, f"Paused {self}."

Pause the job's daemon.

def delete(self, debug: bool = False) -> Tuple[bool, str]:
336    def delete(self, debug: bool = False) -> SuccessTuple:
337        """
338        Delete the job and its daemon.
339        """
340        if self.executor is not None:
341            return self.executor.delete_job(self.name, debug=debug)
342
343        if self.is_running():
344            stop_success, stop_msg = self.stop()
345            if not stop_success:
346                return stop_success, stop_msg
347
348        cleanup_success, cleanup_msg = self.daemon.cleanup()
349        if not cleanup_success:
350            return cleanup_success, cleanup_msg
351
352        _ = self.daemon._properties.pop('result', None)
353        return cleanup_success, f"Deleted {self}."

Delete the job and its daemon.

def is_running(self) -> bool:
355    def is_running(self) -> bool:
356        """
357        Determine whether the job's daemon is running.
358        """
359        return self.status == 'running'

Determine whether the job's daemon is running.

def exists(self, debug: bool = False) -> bool:
361    def exists(self, debug: bool = False) -> bool:
362        """
363        Determine whether the job exists.
364        """
365        if self.executor is not None:
366            return self.executor.get_job_exists(self.name, debug=debug)
367
368        return self.daemon.path.exists()

Determine whether the job exists.

def get_logs(self) -> Optional[str]:
370    def get_logs(self) -> Union[str, None]:
371        """
372        Return the output text of the job's daemon.
373        """
374        if self.executor is not None:
375            return self.executor.get_logs(self.name)
376
377        return self.daemon.log_text

Return the output text of the job's daemon.

def monitor_logs( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
379    def monitor_logs(
380        self,
381        callback_function: Callable[[str], None] = _default_stdout_callback,
382        input_callback_function: Optional[Callable[[], str]] = None,
383        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
384        stop_event: Optional[asyncio.Event] = None,
385        stop_on_exit: bool = False,
386        strip_timestamps: bool = False,
387        accept_input: bool = True,
388        debug: bool = False,
389        _logs_path: Optional[pathlib.Path] = None,
390        _log=None,
391        _stdin_file=None,
392        _wait_if_stopped: bool = True,
393    ):
394        """
395        Monitor the job's log files and execute a callback on new lines.
396
397        Parameters
398        ----------
399        callback_function: Callable[[str], None], default partial(print, end='')
400            The callback to execute as new data comes in.
401            Defaults to printing the output directly to `stdout`.
402
403        input_callback_function: Optional[Callable[[], str]], default None
404            If provided, execute this callback when the daemon is blocking on stdin.
405            Defaults to `sys.stdin.readline()`.
406
407        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
408            If provided, execute this callback when the daemon stops.
409            The job's SuccessTuple will be passed to the callback.
410
411        stop_event: Optional[asyncio.Event], default None
412            If provided, stop monitoring when this event is set.
413            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
414            from within `callback_function` to stop monitoring.
415
416        stop_on_exit: bool, default False
417            If `True`, stop monitoring when the job stops.
418
419        strip_timestamps: bool, default False
420            If `True`, remove leading timestamps from lines.
421
422        accept_input: bool, default True
423            If `True`, accept input when the daemon blocks on stdin.
424        """
425        if self.executor is not None:
426            self.executor.monitor_logs(
427                self.name,
428                callback_function,
429                input_callback_function=input_callback_function,
430                stop_callback_function=stop_callback_function,
431                stop_on_exit=stop_on_exit,
432                accept_input=accept_input,
433                strip_timestamps=strip_timestamps,
434                debug=debug,
435            )
436            return
437
438        monitor_logs_coroutine = self.monitor_logs_async(
439            callback_function=callback_function,
440            input_callback_function=input_callback_function,
441            stop_callback_function=stop_callback_function,
442            stop_event=stop_event,
443            stop_on_exit=stop_on_exit,
444            strip_timestamps=strip_timestamps,
445            accept_input=accept_input,
446            debug=debug,
447            _logs_path=_logs_path,
448            _log=_log,
449            _stdin_file=_stdin_file,
450            _wait_if_stopped=_wait_if_stopped,
451        )
452        return asyncio.run(monitor_logs_coroutine)

Monitor the job's log files and execute a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default partial(print, end='')): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
async def monitor_logs_async( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
454    async def monitor_logs_async(
455        self,
456        callback_function: Callable[[str], None] = _default_stdout_callback,
457        input_callback_function: Optional[Callable[[], str]] = None,
458        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
459        stop_event: Optional[asyncio.Event] = None,
460        stop_on_exit: bool = False,
461        strip_timestamps: bool = False,
462        accept_input: bool = True,
463        debug: bool = False,
464        _logs_path: Optional[pathlib.Path] = None,
465        _log=None,
466        _stdin_file=None,
467        _wait_if_stopped: bool = True,
468    ):
469        """
470        Monitor the job's log files and await a callback on new lines.
471
472        Parameters
473        ----------
474        callback_function: Callable[[str], None], default _default_stdout_callback
475            The callback to execute as new data comes in.
476            Defaults to printing the output directly to `stdout`.
477
478        input_callback_function: Optional[Callable[[], str]], default None
479            If provided, execute this callback when the daemon is blocking on stdin.
480            Defaults to `sys.stdin.readline()`.
481
482        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
483            If provided, execute this callback when the daemon stops.
484            The job's SuccessTuple will be passed to the callback.
485
486        stop_event: Optional[asyncio.Event], default None
487            If provided, stop monitoring when this event is set.
488            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
489            from within `callback_function` to stop monitoring.
490
491        stop_on_exit: bool, default False
492            If `True`, stop monitoring when the job stops.
493
494        strip_timestamps: bool, default False
495            If `True`, remove leading timestamps from lines.
496
497        accept_input: bool, default True
498            If `True`, accept input when the daemon blocks on stdin.
499        """
500        from meerschaum.utils.prompt import prompt
501
502        def default_input_callback_function():
503            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
504            if prompt_kwargs:
505                answer = prompt(**prompt_kwargs)
506                return answer + '\n'
507            return sys.stdin.readline()
508
509        if input_callback_function is None:
510            input_callback_function = default_input_callback_function
511
512        if self.executor is not None:
513            await self.executor.monitor_logs_async(
514                self.name,
515                callback_function,
516                input_callback_function=input_callback_function,
517                stop_callback_function=stop_callback_function,
518                stop_on_exit=stop_on_exit,
519                strip_timestamps=strip_timestamps,
520                accept_input=accept_input,
521                debug=debug,
522            )
523            return
524
525        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
526
527        events = {
528            'user': stop_event,
529            'stopped': asyncio.Event(),
530            'stop_token': asyncio.Event(),
531            'stop_exception': asyncio.Event(),
532            'stopped_timeout': asyncio.Event(),
533        }
534        combined_event = asyncio.Event()
535        emitted_text = False
536        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
537
538        async def check_job_status():
539            if not stop_on_exit:
540                return
541
542            nonlocal emitted_text
543
544            sleep_time = 0.1
545            while sleep_time < 0.2:
546                if self.status == 'stopped':
547                    if not emitted_text and _wait_if_stopped:
548                        await asyncio.sleep(sleep_time)
549                        sleep_time = round(sleep_time * 1.1, 3)
550                        continue
551
552                    if stop_callback_function is not None:
553                        try:
554                            if asyncio.iscoroutinefunction(stop_callback_function):
555                                await stop_callback_function(self.result)
556                            else:
557                                stop_callback_function(self.result)
558                        except asyncio.exceptions.CancelledError:
559                            break
560                        except Exception:
561                            warn(traceback.format_exc())
562
563                    if stop_on_exit:
564                        events['stopped'].set()
565
566                    break
567                await asyncio.sleep(0.1)
568
569            events['stopped_timeout'].set()
570
571        async def check_blocking_on_input():
572            while True:
573                if not emitted_text or not self.is_blocking_on_stdin():
574                    try:
575                        await asyncio.sleep(self.refresh_seconds)
576                    except asyncio.exceptions.CancelledError:
577                        break
578                    continue
579
580                if not self.is_running():
581                    break
582
583                await emit_latest_lines()
584
585                try:
586                    print('', end='', flush=True)
587                    if asyncio.iscoroutinefunction(input_callback_function):
588                        data = await input_callback_function()
589                    else:
590                        loop = asyncio.get_running_loop()
591                        data = await loop.run_in_executor(None, input_callback_function)
592                except KeyboardInterrupt:
593                    break
594                #  if not data.endswith('\n'):
595                    #  data += '\n'
596
597                stdin_file.write(data)
598                await asyncio.sleep(self.refresh_seconds)
599
600        async def combine_events():
601            event_tasks = [
602                asyncio.create_task(event.wait())
603                for event in events.values()
604                if event is not None
605            ]
606            if not event_tasks:
607                return
608
609            try:
610                done, pending = await asyncio.wait(
611                    event_tasks,
612                    return_when=asyncio.FIRST_COMPLETED,
613                )
614                for task in pending:
615                    task.cancel()
616            except asyncio.exceptions.CancelledError:
617                pass
618            finally:
619                combined_event.set()
620
621        check_job_status_task = asyncio.create_task(check_job_status())
622        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
623        combine_events_task = asyncio.create_task(combine_events())
624
625        log = _log if _log is not None else self.daemon.rotating_log
626        lines_to_show = (
627            self.daemon.properties.get(
628                'logs', {}
629            ).get(
630                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
631            )
632        )
633
634        async def emit_latest_lines():
635            nonlocal emitted_text
636            nonlocal stop_event
637            lines = log.readlines()
638            for line in lines[(-1 * lines_to_show):]:
639                if stop_event is not None and stop_event.is_set():
640                    return
641
642                line_stripped_extra = strip_timestamp_from_line(line.strip())
643                line_stripped = strip_timestamp_from_line(line)
644
645                if line_stripped_extra == STOP_TOKEN:
646                    events['stop_token'].set()
647                    return
648
649                if line_stripped_extra == CLEAR_TOKEN:
650                    clear_screen(debug=debug)
651                    continue
652
653                if line_stripped_extra == FLUSH_TOKEN.strip():
654                    line_stripped = ''
655                    line = ''
656
657                if strip_timestamps:
658                    line = line_stripped
659
660                try:
661                    if asyncio.iscoroutinefunction(callback_function):
662                        await callback_function(line)
663                    else:
664                        callback_function(line)
665                    emitted_text = True
666                except StopMonitoringLogs:
667                    events['stop_exception'].set()
668                    return
669                except Exception:
670                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
671
672        await emit_latest_lines()
673
674        tasks = (
675            [check_job_status_task]
676            + ([check_blocking_on_input_task] if accept_input else [])
677            + [combine_events_task]
678        )
679        try:
680            _ = asyncio.gather(*tasks, return_exceptions=True)
681        except asyncio.exceptions.CancelledError:
682            raise
683        except Exception:
684            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
685
686        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
687        dir_path_to_monitor = (
688            _logs_path
689            or (log.file_path.parent if log else None)
690            or paths.LOGS_RESOURCES_PATH
691        )
692        async for changes in watchfiles.awatch(
693            dir_path_to_monitor,
694            stop_event=combined_event,
695        ):
696            for change in changes:
697                file_path_str = change[1]
698                file_path = pathlib.Path(file_path_str)
699                latest_subfile_path = log.get_latest_subfile_path()
700                if latest_subfile_path != file_path:
701                    continue
702
703                await emit_latest_lines()
704
705        await emit_latest_lines()

Monitor the job's log files and await a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default _default_stdout_callback): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
def is_blocking_on_stdin(self, debug: bool = False) -> bool:
707    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
708        """
709        Return whether a job's daemon is blocking on stdin.
710        """
711        if self.executor is not None:
712            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
713
714        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()

Return whether a job's daemon is blocking on stdin.

def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
716    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
717        """
718        Return the kwargs to the blocking `prompt()`, if available.
719        """
720        if self.executor is not None:
721            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
722
723        if not self.daemon.prompt_kwargs_file_path.exists():
724            return {}
725
726        try:
727            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
728                prompt_kwargs = json.load(f)
729
730            return prompt_kwargs
731        
732        except Exception:
733            import traceback
734            traceback.print_exc()
735            return {}

Return the kwargs to the blocking prompt(), if available.

def write_stdin(self, data):
737    def write_stdin(self, data):
738        """
739        Write to a job's daemon's `stdin`.
740        """
741        self.daemon.stdin_file.write(data)

Write to a job's daemon's stdin.

executor: Optional[meerschaum.jobs.Executor]
743    @property
744    def executor(self) -> Union[Executor, None]:
745        """
746        If the job is remote, return the connector to the remote API instance.
747        """
748        return (
749            mrsm.get_connector(self.executor_keys)
750            if self.executor_keys != 'local'
751            else None
752        )

If the job is remote, return the connector to the remote API instance.

status: str
754    @property
755    def status(self) -> str:
756        """
757        Return the running status of the job's daemon.
758        """
759        if '_status_hook' in self.__dict__:
760            return self._status_hook()
761
762        if self.executor is not None:
763            return self.executor.get_job_status(self.name)
764
765        return self.daemon.status

Return the running status of the job's daemon.

pid: Optional[int]
767    @property
768    def pid(self) -> Union[int, None]:
769        """
770        Return the PID of the job's dameon.
771        """
772        if self.executor is not None:
773            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
774
775        return self.daemon.pid

Return the PID of the job's dameon.

restart: bool
777    @property
778    def restart(self) -> bool:
779        """
780        Return whether to restart a stopped job.
781        """
782        if self.executor is not None:
783            return self.executor.get_job_metadata(self.name).get('restart', False)
784
785        return self.daemon.properties.get('restart', False)

Return whether to restart a stopped job.

result: Tuple[bool, str]
787    @property
788    def result(self) -> SuccessTuple:
789        """
790        Return the `SuccessTuple` when the job has terminated.
791        """
792        if self.is_running():
793            return True, f"{self} is running."
794
795        if '_result_hook' in self.__dict__:
796            return self._result_hook()
797
798        if self.executor is not None:
799            return (
800                self.executor.get_job_metadata(self.name)
801                .get('result', (False, "No result available."))
802            )
803
804        _result = self.daemon.properties.get('result', None)
805        if _result is None:
806            from meerschaum.utils.daemon.Daemon import _results
807            return _results.get(self.daemon.daemon_id, (False, "No result available."))
808
809        return tuple(_result)

Return the SuccessTuple when the job has terminated.

sysargs: List[str]
811    @property
812    def sysargs(self) -> List[str]:
813        """
814        Return the sysargs to use for the Daemon.
815        """
816        if self._sysargs:
817            return self._sysargs
818
819        if self.executor is not None:
820            return self.executor.get_job_metadata(self.name).get('sysargs', [])
821
822        target_args = self.daemon.target_args
823        if target_args is None:
824            return []
825        self._sysargs = target_args[0] if len(target_args) > 0 else []
826        return self._sysargs

Return the sysargs to use for the Daemon.

def get_daemon_properties(self) -> Dict[str, Any]:
828    def get_daemon_properties(self) -> Dict[str, Any]:
829        """
830        Return the `properties` dictionary for the job's daemon.
831        """
832        remote_properties = (
833            {}
834            if self.executor is None
835            else self.executor.get_job_properties(self.name)
836        )
837        return {
838            **remote_properties,
839            **self._properties_patch
840        }

Return the properties dictionary for the job's daemon.

daemon: "'Daemon'"
842    @property
843    def daemon(self) -> 'Daemon':
844        """
845        Return the daemon which this job manages.
846        """
847        from meerschaum.utils.daemon import Daemon
848        if self._daemon is not None and self.executor is None and self._sysargs:
849            return self._daemon
850
851        self._daemon = Daemon(
852            target=entry,
853            target_args=[self._sysargs],
854            target_kw={},
855            daemon_id=self.name,
856            label=shlex.join(self._sysargs),
857            properties=self.get_daemon_properties(),
858        )
859        if '_rotating_log' in self.__dict__:
860            self._daemon._rotating_log = self._rotating_log
861
862        if '_stdin_file' in self.__dict__:
863            self._daemon._stdin_file = self._stdin_file
864            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
865
866        return self._daemon

Return the daemon which this job manages.

began: Optional[datetime.datetime]
868    @property
869    def began(self) -> Union[datetime, None]:
870        """
871        The datetime when the job began running.
872        """
873        if self.executor is not None:
874            began_str = self.executor.get_job_began(self.name)
875            if began_str is None:
876                return None
877            return (
878                datetime.fromisoformat(began_str)
879                .astimezone(timezone.utc)
880                .replace(tzinfo=None)
881            )
882
883        began_str = self.daemon.properties.get('process', {}).get('began', None)
884        if began_str is None:
885            return None
886
887        return datetime.fromisoformat(began_str)

The datetime when the job began running.

ended: Optional[datetime.datetime]
889    @property
890    def ended(self) -> Union[datetime, None]:
891        """
892        The datetime when the job stopped running.
893        """
894        if self.executor is not None:
895            ended_str = self.executor.get_job_ended(self.name)
896            if ended_str is None:
897                return None
898            return (
899                datetime.fromisoformat(ended_str)
900                .astimezone(timezone.utc)
901                .replace(tzinfo=None)
902            )
903
904        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
905        if ended_str is None:
906            return None
907
908        return datetime.fromisoformat(ended_str)

The datetime when the job stopped running.

paused: Optional[datetime.datetime]
910    @property
911    def paused(self) -> Union[datetime, None]:
912        """
913        The datetime when the job was suspended while running.
914        """
915        if self.executor is not None:
916            paused_str = self.executor.get_job_paused(self.name)
917            if paused_str is None:
918                return None
919            return (
920                datetime.fromisoformat(paused_str)
921                .astimezone(timezone.utc)
922                .replace(tzinfo=None)
923            )
924
925        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
926        if paused_str is None:
927            return None
928
929        return datetime.fromisoformat(paused_str)

The datetime when the job was suspended while running.

stop_time: Optional[datetime.datetime]
931    @property
932    def stop_time(self) -> Union[datetime, None]:
933        """
934        Return the timestamp when the job was manually stopped.
935        """
936        if self.executor is not None:
937            return self.executor.get_job_stop_time(self.name)
938
939        if not self.daemon.stop_path.exists():
940            return None
941
942        stop_data = self.daemon._read_stop_file()
943        if not stop_data:
944            return None
945
946        stop_time_str = stop_data.get('stop_time', None)
947        if not stop_time_str:
948            warn(f"Could not read stop time for {self}.")
949            return None
950
951        return datetime.fromisoformat(stop_time_str)

Return the timestamp when the job was manually stopped.

hidden: bool
953    @property
954    def hidden(self) -> bool:
955        """
956        Return a bool indicating whether this job should be displayed.
957        """
958        return (
959            self.name.startswith('_')
960            or self.name.startswith('.')
961            or self._is_externally_managed
962        )

Return a bool indicating whether this job should be displayed.

def check_restart(self) -> Tuple[bool, str]:
964    def check_restart(self) -> SuccessTuple:
965        """
966        If `restart` is `True` and the daemon is not running,
967        restart the job.
968        Do not restart if the job was manually stopped.
969        """
970        if self.is_running():
971            return True, f"{self} is running."
972
973        if not self.restart:
974            return True, f"{self} does not need to be restarted."
975
976        if self.stop_time is not None:
977            return True, f"{self} was manually stopped."
978
979        return self.start()

If restart is True and the daemon is not running, restart the job. Do not restart if the job was manually stopped.

label: str
981    @property
982    def label(self) -> str:
983        """
984        Return the job's Daemon label (joined sysargs).
985        """
986        from meerschaum._internal.arguments import compress_pipeline_sysargs
987        sysargs = compress_pipeline_sysargs(self.sysargs)
988        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()

Return the job's Daemon label (joined sysargs).

env: Dict[str, str]
1017    @property
1018    def env(self) -> Dict[str, str]:
1019        """
1020        Return the environment variables to set for the job's process.
1021        """
1022        if '_env' in self.__dict__:
1023            return self.__dict__['_env']
1024
1025        _env = self.daemon.properties.get('env', {})
1026        default_env = {
1027            'PYTHONUNBUFFERED': '1',
1028            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1029            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1030            STATIC_CONFIG['environment']['noninteractive']: 'true',
1031        }
1032        self._env = {**default_env, **_env}
1033        return self._env

Return the environment variables to set for the job's process.

delete_after_completion: bool
1035    @property
1036    def delete_after_completion(self) -> bool:
1037        """
1038        Return whether this job is configured to delete itself after completion.
1039        """
1040        if '_delete_after_completion' in self.__dict__:
1041            return self.__dict__.get('_delete_after_completion', False)
1042
1043        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1044        return self._delete_after_completion

Return whether this job is configured to delete itself after completion.

def pprint( *args, detect_password: bool = True, nopretty: bool = False, **kw) -> None:
 10def pprint(
 11    *args,
 12    detect_password: bool = True,
 13    nopretty: bool = False,
 14    **kw
 15) -> None:
 16    """Pretty print an object according to the configured ANSI and UNICODE settings.
 17    If detect_password is True (default), search and replace passwords with '*' characters.
 18    Does not mutate objects.
 19    """
 20    import copy
 21    import json
 22    from meerschaum.utils.packages import attempt_import, import_rich
 23    from meerschaum.utils.formatting import ANSI, get_console, print_tuple
 24    from meerschaum.utils.warnings import error
 25    from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords
 26    from collections import OrderedDict
 27
 28    if (
 29        len(args) == 1
 30        and
 31        isinstance(args[0], tuple)
 32        and
 33        len(args[0]) == 2
 34        and
 35        isinstance(args[0][0], bool)
 36        and
 37        isinstance(args[0][1], str)
 38    ):
 39        return print_tuple(args[0], **filter_keywords(print_tuple, **kw))
 40
 41    modify = True
 42    rich_pprint = None
 43    if ANSI and not nopretty:
 44        rich = import_rich()
 45        if rich is not None:
 46            rich_pretty = attempt_import('rich.pretty')
 47        if rich_pretty is not None:
 48            def _rich_pprint(*args, **kw):
 49                _console = get_console()
 50                _kw = filter_keywords(_console.print, **kw)
 51                _console.print(*args, **_kw)
 52            rich_pprint = _rich_pprint
 53    elif not nopretty:
 54        pprintpp = attempt_import('pprintpp', warn=False)
 55        try:
 56            _pprint = pprintpp.pprint
 57        except Exception :
 58            import pprint as _pprint_module
 59            _pprint = _pprint_module.pprint
 60
 61    func = (
 62        _pprint if rich_pprint is None else rich_pprint
 63    ) if not nopretty else print
 64
 65    try:
 66        args_copy = copy.deepcopy(args)
 67    except Exception:
 68        args_copy = args
 69        modify = False
 70
 71    _args = []
 72    for a in args:
 73        c = a
 74        ### convert OrderedDict into dict
 75        if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict):
 76            c = dict_from_od(copy.deepcopy(c))
 77        _args.append(c)
 78    args = _args
 79
 80    _args = list(args)
 81    if detect_password and modify:
 82        _args = []
 83        for a in args:
 84            c = a
 85            if isinstance(c, dict):
 86                c = replace_password(copy.deepcopy(c))
 87            if nopretty:
 88                try:
 89                    c = json.dumps(c)
 90                    is_json = True
 91                except Exception:
 92                    is_json = False
 93                if not is_json:
 94                    try:
 95                        c = str(c)
 96                    except Exception:
 97                        pass
 98            _args.append(c)
 99
100    ### filter out unsupported keywords
101    func_kw = filter_keywords(func, **kw) if not nopretty else {}
102    error_msg = None
103    try:
104        func(*_args, **func_kw)
105    except Exception as e:
106        error_msg = e
107    if error_msg is not None:
108        error(error_msg)

Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.

def attempt_import( *names: str, lazy: bool = True, warn: bool = True, install: bool = True, venv: Optional[str] = 'mrsm', precheck: bool = True, split: bool = True, check_update: bool = False, check_pypi: bool = False, check_is_installed: bool = True, allow_outside_venv: bool = True, color: bool = True, debug: bool = False) -> Any:
1250def attempt_import(
1251    *names: str,
1252    lazy: bool = True,
1253    warn: bool = True,
1254    install: bool = True,
1255    venv: Optional[str] = 'mrsm',
1256    precheck: bool = True,
1257    split: bool = True,
1258    check_update: bool = False,
1259    check_pypi: bool = False,
1260    check_is_installed: bool = True,
1261    allow_outside_venv: bool = True,
1262    color: bool = True,
1263    debug: bool = False
1264) -> Any:
1265    """
1266    Raise a warning if packages are not installed; otherwise import and return modules.
1267    If `lazy` is `True`, return lazy-imported modules.
1268    
1269    Returns tuple of modules if multiple names are provided, else returns one module.
1270    
1271    Parameters
1272    ----------
1273    names: List[str]
1274        The packages to be imported.
1275
1276    lazy: bool, default True
1277        If `True`, lazily load packages.
1278
1279    warn: bool, default True
1280        If `True`, raise a warning if a package cannot be imported.
1281
1282    install: bool, default True
1283        If `True`, attempt to install a missing package into the designated virtual environment.
1284        If `check_update` is True, install updates if available.
1285
1286    venv: Optional[str], default 'mrsm'
1287        The virtual environment in which to search for packages and to install packages into.
1288
1289    precheck: bool, default True
1290        If `True`, attempt to find module before importing (necessary for checking if modules exist
1291        and retaining lazy imports), otherwise assume lazy is `False`.
1292
1293    split: bool, default True
1294        If `True`, split packages' names on `'.'`.
1295
1296    check_update: bool, default False
1297        If `True` and `install` is `True`, install updates if the required minimum version
1298        does not match.
1299
1300    check_pypi: bool, default False
1301        If `True` and `check_update` is `True`, check PyPI when determining whether
1302        an update is required.
1303
1304    check_is_installed: bool, default True
1305        If `True`, check if the package is contained in the virtual environment.
1306
1307    allow_outside_venv: bool, default True
1308        If `True`, search outside of the specified virtual environment
1309        if the package cannot be found.
1310        Setting to `False` will reinstall the package into a virtual environment, even if it
1311        is installed outside.
1312
1313    color: bool, default True
1314        If `False`, do not print ANSI colors.
1315
1316    Returns
1317    -------
1318    The specified modules. If they're not available and `install` is `True`, it will first
1319    download them into a virtual environment and return the modules.
1320
1321    Examples
1322    --------
1323    >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
1324    >>> pandas = attempt_import('pandas')
1325
1326    """
1327
1328    import importlib.util
1329
1330    ### to prevent recursion, check if parent Meerschaum package is being imported
1331    if names == ('meerschaum',):
1332        return _import_module('meerschaum')
1333
1334    if venv == 'mrsm' and _import_hook_venv is not None:
1335        if debug:
1336            print(f"Import hook for virtual environment '{_import_hook_venv}' is active.")
1337        venv = _import_hook_venv
1338
1339    _warnings = _import_module('meerschaum.utils.warnings')
1340    warn_function = _warnings.warn
1341
1342    def do_import(_name: str, **kw) -> Union['ModuleType', None]:
1343        with Venv(venv=venv, debug=debug):
1344            ### determine the import method (lazy vs normal)
1345            from meerschaum.utils.misc import filter_keywords
1346            import_method = (
1347                _import_module if not lazy
1348                else lazy_import
1349            )
1350            try:
1351                mod = import_method(_name, **(filter_keywords(import_method, **kw)))
1352            except Exception as e:
1353                if warn:
1354                    import traceback
1355                    traceback.print_exception(type(e), e, e.__traceback__)
1356                    warn_function(
1357                        f"Failed to import module '{_name}'.\nException:\n{e}",
1358                        ImportWarning,
1359                        stacklevel = (5 if lazy else 4),
1360                        color = False,
1361                    )
1362                mod = None
1363        return mod
1364
1365    modules = []
1366    for name in names:
1367        ### Check if package is a declared dependency.
1368        root_name = name.split('.')[0] if split else name
1369        install_name = _import_to_install_name(root_name)
1370
1371        if install_name is None:
1372            install_name = root_name
1373            if warn and root_name != 'plugins':
1374                warn_function(
1375                    f"Package '{root_name}' is not declared in meerschaum.utils.packages.",
1376                    ImportWarning,
1377                    stacklevel = 3,
1378                    color = False
1379                )
1380
1381        ### Determine if the package exists.
1382        if precheck is False:
1383            found_module = (
1384                do_import(
1385                    name, debug=debug, warn=False, venv=venv, color=color,
1386                    check_update=False, check_pypi=False, split=split,
1387                ) is not None
1388            )
1389        else:
1390            if check_is_installed:
1391                with _locks['_is_installed_first_check']:
1392                    if not _is_installed_first_check.get(name, False):
1393                        package_is_installed = is_installed(
1394                            name,
1395                            venv = venv,
1396                            split = split,
1397                            allow_outside_venv = allow_outside_venv,
1398                            debug = debug,
1399                        )
1400                        _is_installed_first_check[name] = package_is_installed
1401                    else:
1402                        package_is_installed = _is_installed_first_check[name]
1403            else:
1404                package_is_installed = _is_installed_first_check.get(
1405                    name,
1406                    venv_contains_package(name, venv=venv, split=split, debug=debug)
1407                )
1408            found_module = package_is_installed
1409
1410        if not found_module:
1411            if install:
1412                if not pip_install(
1413                    install_name,
1414                    venv = venv,
1415                    split = False,
1416                    check_update = check_update,
1417                    color = color,
1418                    debug = debug
1419                ) and warn:
1420                    warn_function(
1421                        f"Failed to install '{install_name}'.",
1422                        ImportWarning,
1423                        stacklevel = 3,
1424                        color = False,
1425                    )
1426            elif warn:
1427                ### Raise a warning if we can't find the package and install = False.
1428                warn_function(
1429                    (f"\n\nMissing package '{name}' from virtual environment '{venv}'; "
1430                     + "some features will not work correctly."
1431                     + "\n\nSet install=True when calling attempt_import.\n"),
1432                    ImportWarning,
1433                    stacklevel = 3,
1434                    color = False,
1435                )
1436
1437        ### Do the import. Will be lazy if lazy=True.
1438        m = do_import(
1439            name, debug=debug, warn=warn, venv=venv, color=color,
1440            check_update=check_update, check_pypi=check_pypi, install=install, split=split,
1441        )
1442        modules.append(m)
1443
1444    modules = tuple(modules)
1445    if len(modules) == 1:
1446        return modules[0]
1447    return modules

Raise a warning if packages are not installed; otherwise import and return modules. If lazy is True, return lazy-imported modules.

Returns tuple of modules if multiple names are provided, else returns one module.

Parameters
  • names (List[str]): The packages to be imported.
  • lazy (bool, default True): If True, lazily load packages.
  • warn (bool, default True): If True, raise a warning if a package cannot be imported.
  • install (bool, default True): If True, attempt to install a missing package into the designated virtual environment. If check_update is True, install updates if available.
  • venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
  • precheck (bool, default True): If True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy is False.
  • split (bool, default True): If True, split packages' names on '.'.
  • check_update (bool, default False): If True and install is True, install updates if the required minimum version does not match.
  • check_pypi (bool, default False): If True and check_update is True, check PyPI when determining whether an update is required.
  • check_is_installed (bool, default True): If True, check if the package is contained in the virtual environment.
  • allow_outside_venv (bool, default True): If True, search outside of the specified virtual environment if the package cannot be found. Setting to False will reinstall the package into a virtual environment, even if it is installed outside.
  • color (bool, default True): If False, do not print ANSI colors.
Returns
  • The specified modules. If they're not available and install is True, it will first
  • download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
class Connector:
 22class Connector(metaclass=abc.ABCMeta):
 23    """
 24    The base connector class to hold connection attributes.
 25    """
 26
 27    IS_INSTANCE: bool = False
 28
 29    def __init__(
 30        self,
 31        type: Optional[str] = None,
 32        label: Optional[str] = None,
 33        **kw: Any
 34    ):
 35        """
 36        Set the given keyword arguments as attributes.
 37
 38        Parameters
 39        ----------
 40        type: str
 41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
 42
 43        label: str
 44            The `label` for the connector.
 45
 46
 47        Examples
 48        --------
 49        Run `mrsm edit config` and to edit connectors in the YAML file:
 50
 51        ```yaml
 52        meerschaum:
 53            connections:
 54                {type}:
 55                    {label}:
 56                        ### attributes go here
 57        ```
 58
 59        """
 60        self._original_dict = copy.deepcopy(self.__dict__)
 61        self._set_attributes(type=type, label=label, **kw)
 62
 63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
 64        self.verify_attributes(
 65            ['uri']
 66            if 'uri' in self.__dict__
 67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
 68        )
 69
 70    def _reset_attributes(self):
 71        self.__dict__ = self._original_dict
 72
 73    def _set_attributes(
 74        self,
 75        *args,
 76        inherit_default: bool = True,
 77        **kw: Any
 78    ):
 79        from meerschaum._internal.static import STATIC_CONFIG
 80        from meerschaum.utils.warnings import error
 81
 82        self._attributes = {}
 83
 84        default_label = STATIC_CONFIG['connectors']['default_label']
 85
 86        ### NOTE: Support the legacy method of explicitly passing the type.
 87        label = kw.get('label', None)
 88        if label is None:
 89            if len(args) == 2:
 90                label = args[1]
 91            elif len(args) == 0:
 92                label = None
 93            else:
 94                label = args[0]
 95
 96        if label == 'default':
 97            error(
 98                f"Label cannot be 'default'. Did you mean '{default_label}'?",
 99                InvalidAttributesError,
100            )
101        self.__dict__['label'] = label
102
103        from meerschaum.config import get_config
104        conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors'))
105        connector_config = copy.deepcopy(get_config('system', 'connectors'))
106
107        ### inherit attributes from 'default' if exists
108        if inherit_default:
109            inherit_from = 'default'
110            if self.type in conn_configs and inherit_from in conn_configs[self.type]:
111                _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from])
112                self._attributes.update(_inherit_dict)
113
114        ### load user config into self._attributes
115        if self.type in conn_configs and self.label in conn_configs[self.type]:
116            self._attributes.update(conn_configs[self.type][self.label] or {})
117
118        ### load system config into self._sys_config
119        ### (deep copy so future Connectors don't inherit changes)
120        if self.type in connector_config:
121            self._sys_config = copy.deepcopy(connector_config[self.type])
122
123        ### add additional arguments or override configuration
124        self._attributes.update(kw)
125
126        ### finally, update __dict__ with _attributes.
127        self.__dict__.update(self._attributes)
128
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )
175
176
177    def __str__(self):
178        """
179        When cast to a string, return type:label.
180        """
181        return f"{self.type}:{self.label}"
182
183    def __repr__(self):
184        """
185        Represent the connector as type:label.
186        """
187        return str(self)
188
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta
204
205
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type
225
226
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

The base connector class to hold connection attributes.

Connector(type: Optional[str] = None, label: Optional[str] = None, **kw: Any)
29    def __init__(
30        self,
31        type: Optional[str] = None,
32        label: Optional[str] = None,
33        **kw: Any
34    ):
35        """
36        Set the given keyword arguments as attributes.
37
38        Parameters
39        ----------
40        type: str
41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
42
43        label: str
44            The `label` for the connector.
45
46
47        Examples
48        --------
49        Run `mrsm edit config` and to edit connectors in the YAML file:
50
51        ```yaml
52        meerschaum:
53            connections:
54                {type}:
55                    {label}:
56                        ### attributes go here
57        ```
58
59        """
60        self._original_dict = copy.deepcopy(self.__dict__)
61        self._set_attributes(type=type, label=label, **kw)
62
63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
64        self.verify_attributes(
65            ['uri']
66            if 'uri' in self.__dict__
67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
68        )

Set the given keyword arguments as attributes.

Parameters
  • type (str): The type of the connector (e.g. sql, api, plugin).
  • label (str): The label for the connector.
Examples

Run mrsm edit config and to edit connectors in the YAML file:

meerschaum:
    connections:
        {type}:
            {label}:
                ### attributes go here
IS_INSTANCE: bool = False
def verify_attributes( self, required_attributes: Optional[List[str]] = None, debug: bool = False) -> None:
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )

Ensure that the required attributes have been met.

The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.

Parameters
  • required_attributes (Optional[List[str]], default None): Attributes to be verified. If None, default to ['label'].
  • debug (bool, default False): Verbosity toggle.
Returns
  • Don't return anything.
Raises
  • An error if any of the required attributes are missing.
meta: Dict[str, Any]
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta

Return the keys needed to reconstruct this Connector.

type: str
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type

Return the type for this connector.

label: str
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

Return the label for this connector.

class InstanceConnector(meerschaum.Connector):
18class InstanceConnector(Connector):
19    """
20    Instance connectors define the interface for managing pipes and provide methods
21    for management of users, plugins, tokens, and other metadata built atop pipes.
22    """
23
24    IS_INSTANCE: bool = True
25    IS_THREAD_SAFE: bool = False
26
27    from ._users import (
28        get_users_pipe,
29        register_user,
30        get_user_id,
31        get_username,
32        get_users,
33        edit_user,
34        delete_user,
35        get_user_password_hash,
36        get_user_type,
37        get_user_attributes,
38    )
39
40    from ._plugins import (
41        get_plugins_pipe,
42        register_plugin,
43        get_plugin_user_id,
44        delete_plugin,
45        get_plugin_id,
46        get_plugin_version,
47        get_plugins,
48        get_plugin_user_id,
49        get_plugin_username,
50        get_plugin_attributes,
51    )
52
53    from ._tokens import (
54        get_tokens_pipe,
55        register_token,
56        edit_token,
57        invalidate_token,
58        delete_token,
59        get_token,
60        get_tokens,
61        get_token_model,
62        get_token_secret_hash,
63        token_exists,
64        get_token_scopes,
65    )
66
67    from ._pipes import (
68        register_pipe,
69        get_pipe_attributes,
70        get_pipe_id,
71        edit_pipe,
72        delete_pipe,
73        fetch_pipes_keys,
74        pipe_exists,
75        drop_pipe,
76        drop_pipe_indices,
77        sync_pipe,
78        create_pipe_indices,
79        clear_pipe,
80        get_pipe_data,
81        get_pipe_docs,
82        get_sync_time,
83        get_pipe_columns_types,
84        get_pipe_columns_indices,
85    )

Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.

IS_INSTANCE: bool = True
IS_THREAD_SAFE: bool = False
def get_users_pipe(self) -> Pipe:
18def get_users_pipe(self) -> 'mrsm.Pipe':
19    """
20    Return the pipe used for users registration.
21    """
22    if '_users_pipe' in self.__dict__:
23        return self._users_pipe
24
25    cache_connector = self.__dict__.get('_cache_connector', None)
26    self._users_pipe = mrsm.Pipe(
27        'mrsm', 'users',
28        instance=self,
29        target='mrsm_users',
30        temporary=True,
31        cache=True,
32        cache_connector_keys=cache_connector,
33        static=True,
34        null_indices=False,
35        columns={
36            'primary': 'user_id',
37        },
38        dtypes={
39            'user_id': 'uuid',
40            'username': 'string',
41            'password_hash': 'string',
42            'email': 'string',
43            'user_type': 'string',
44            'attributes': 'json',
45        },
46        indices={
47            'unique': 'username',
48        },
49    )
50    return self._users_pipe

Return the pipe used for users registration.

def register_user( self, user: meerschaum.core.User._User.User, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
53def register_user(
54    self,
55    user: User,
56    debug: bool = False,
57    **kwargs: Any
58) -> mrsm.SuccessTuple:
59    """
60    Register a new user to the users pipe.
61    """
62    users_pipe = self.get_users_pipe()
63    user.user_id = uuid.uuid4()
64    sync_success, sync_msg = users_pipe.sync(
65        [{
66            'user_id': user.user_id,
67            'username': user.username,
68            'email': user.email,
69            'password_hash': user.password_hash,
70            'user_type': user.type,
71            'attributes': user.attributes,
72        }],
73        check_existing=False,
74        debug=debug,
75    )
76    if not sync_success:
77        return False, f"Failed to register user '{user.username}':\n{sync_msg}"
78
79    return True, "Success"

Register a new user to the users pipe.

def get_user_id( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
83    """
84    Return a user's ID from the username.
85    """
86    users_pipe = self.get_users_pipe()
87    result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1)
88    if result_df is None or len(result_df) == 0:
89        return None
90    return result_df['user_id'][0]

Return a user's ID from the username.

def get_username(self, user_id: Any, debug: bool = False) -> Any:
93def get_username(self, user_id: Any, debug: bool = False) -> Any:
94    """
95    Return the username from the given ID.
96    """
97    users_pipe = self.get_users_pipe()
98    return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)

Return the username from the given ID.

def get_users(self, debug: bool = False, **kw: Any) -> List[str]:
101def get_users(
102    self,
103    debug: bool = False,
104    **kw: Any
105) -> List[str]:
106    """
107    Get the registered usernames.
108    """
109    users_pipe = self.get_users_pipe()
110    df = users_pipe.get_data()
111    if df is None:
112        return []
113
114    return list(df['username'])

Get the registered usernames.

def edit_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
118    """
119    Edit the attributes for an existing user.
120    """
121    users_pipe = self.get_users_pipe()
122    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
123
124    doc = {'user_id': user_id}
125    if user.email != '':
126        doc['email'] = user.email
127    if user.password_hash != '':
128        doc['password_hash'] = user.password_hash
129    if user.type != '':
130        doc['user_type'] = user.type
131    if user.attributes:
132        doc['attributes'] = user.attributes
133
134    sync_success, sync_msg = users_pipe.sync([doc], debug=debug)
135    if not sync_success:
136        return False, f"Failed to edit user '{user.username}':\n{sync_msg}"
137
138    return True, "Success"

Edit the attributes for an existing user.

def delete_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
142    """
143    Delete a user from the users table.
144    """
145    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
146    users_pipe = self.get_users_pipe()
147    clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug)
148    if not clear_success:
149        return False, f"Failed to delete user '{user}':\n{clear_msg}"
150    return True, "Success"

Delete a user from the users table.

def get_user_password_hash( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
154    """
155    Get a user's password hash from the users table.
156    """
157    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
158    users_pipe = self.get_users_pipe()
159    result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug)
160    if result_df is None or len(result_df) == 0:
161        return None
162
163    return result_df['password_hash'][0]

Get a user's password hash from the users table.

def get_user_type( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[str]:
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]:
167    """
168    Get a user's type from the users table.
169    """
170    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
171    users_pipe = self.get_users_pipe()
172    result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug)
173    if result_df is None or len(result_df) == 0:
174        return None
175
176    return result_df['user_type'][0]

Get a user's type from the users table.

def get_user_attributes( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[Dict[str, Any]]:
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]:
180    """
181    Get a user's attributes from the users table.
182    """
183    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
184    users_pipe = self.get_users_pipe()
185    result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug)
186    if result_df is None or len(result_df) == 0:
187        return None
188
189    return result_df['attributes'][0]

Get a user's attributes from the users table.

def get_plugins_pipe(self) -> Pipe:
16def get_plugins_pipe(self) -> 'mrsm.Pipe':
17    """
18    Return the internal pipe for syncing plugins metadata.
19    """
20    if '_plugins_pipe' in self.__dict__:
21        return self._plugins_pipe
22
23    cache_connector = self.__dict__.get('_cache_connector', None)
24    users_pipe = self.get_users_pipe()
25    user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid')
26
27    self._plugins_pipe = mrsm.Pipe(
28        'mrsm', 'plugins',
29        instance=self,
30        target='mrsm_plugins',
31        temporary=True,
32        cache=True,
33        cache_connector_keys=cache_connector,
34        static=True,
35        null_indices=False,
36        columns={
37            'primary': 'plugin_name',
38            'user_id': 'user_id',
39        },
40        dtypes={
41            'plugin_name': 'string',
42            'user_id': user_id_dtype,
43            'attributes': 'json',
44            'version': 'string',
45        },
46    )
47    return self._plugins_pipe

Return the internal pipe for syncing plugins metadata.

def register_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
51    """
52    Register a new plugin to the plugins table.
53    """
54    plugins_pipe = self.get_plugins_pipe()
55    users_pipe = self.get_users_pipe()
56    user_id = self.get_plugin_user_id(plugin)
57    if user_id is not None:
58        username = self.get_username(user_id, debug=debug)
59        return False, f"{plugin} is already registered to '{username}'."
60
61    doc = {
62        'plugin_name': plugin.name,
63        'version': plugin.version,
64        'attributes': plugin.attributes,
65        'user_id': plugin.user_id,
66    }
67
68    sync_success, sync_msg = plugins_pipe.sync(
69        [doc],
70        check_existing=False,
71        debug=debug,
72    )
73    if not sync_success:
74        return False, f"Failed to register {plugin}:\n{sync_msg}"
75
76    return True, "Success"

Register a new plugin to the plugins table.

def get_plugin_user_id( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
80    """
81    Return the user ID for plugin's owner.
82    """
83    plugins_pipe = self.get_plugins_pipe() 
84    return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)

Return the user ID for plugin's owner.

def delete_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
106    """
107    Delete a plugin's registration.
108    """
109    plugin_id = self.get_plugin_id(plugin, debug=debug)
110    if plugin_id is None:
111        return False, f"{plugin} is not registered."
112    
113    plugins_pipe = self.get_plugins_pipe()
114    clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug)
115    if not clear_success:
116        return False, f"Failed to delete {plugin}:\n{clear_msg}"
117    return True, "Success"

Delete a plugin's registration.

def get_plugin_id( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
 97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
 98    """
 99    Return a plugin's ID.
100    """
101    user_id = self.get_plugin_user_id(plugin, debug=debug)
102    return plugin.name if user_id is not None else None

Return a plugin's ID.

def get_plugin_version( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
121    """
122    Return the version for a plugin.
123    """
124    plugins_pipe = self.get_plugins_pipe() 
125    return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)

Return the version for a plugin.

def get_plugins( self, user_id: Optional[int] = None, search_term: Optional[str] = None, debug: bool = False, **kw: Any) -> List[str]:
136def get_plugins(
137    self,
138    user_id: Optional[int] = None,
139    search_term: Optional[str] = None,
140    debug: bool = False,
141    **kw: Any
142) -> List[str]:
143    """
144    Return a list of plugin names.
145    """
146    plugins_pipe = self.get_plugins_pipe()
147    params = {}
148    if user_id:
149        params['user_id'] = user_id
150
151    df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug)
152    if df is None:
153        return []
154
155    docs = df.to_dict(orient='records')
156    return [
157        plugin_name
158        for doc in docs
159        if (plugin_name := doc['plugin_name']).startswith(search_term or '')
160    ]

Return a list of plugin names.

def get_plugin_username( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
88    """
89    Return the username for plugin's owner.
90    """
91    user_id = self.get_plugin_user_id(plugin, debug=debug)
92    if user_id is None:
93        return None
94    return self.get_username(user_id, debug=debug)

Return the username for plugin's owner.

def get_plugin_attributes( self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
129    """
130    Return the attributes for a plugin.
131    """
132    plugins_pipe = self.get_plugins_pipe() 
133    return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}

Return the attributes for a plugin.

def get_tokens_pipe(self) -> Pipe:
22def get_tokens_pipe(self) -> mrsm.Pipe:
23    """
24    Return the internal pipe for tokens management.
25    """
26    if '_tokens_pipe' in self.__dict__:
27        return self._tokens_pipe
28
29    users_pipe = self.get_users_pipe()
30    user_id_dtype = (
31        users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid')
32    )
33
34    cache_connector = self.__dict__.get('_cache_connector', None)
35
36    self._tokens_pipe = mrsm.Pipe(
37        'mrsm', 'tokens',
38        instance=self,
39        target='mrsm_tokens',
40        temporary=True,
41        cache=True,
42        cache_connector_keys=cache_connector,
43        static=True,
44        autotime=True,
45        null_indices=False,
46        columns={
47            'datetime': 'creation',
48            'primary': 'id',
49        },
50        indices={
51            'unique': 'label',
52            'user_id': 'user_id',
53        },
54        dtypes={
55            'id': 'uuid',
56            'creation': 'datetime',
57            'expiration': 'datetime',
58            'is_valid': 'bool',
59            'label': 'string',
60            'user_id': user_id_dtype,
61            'scopes': 'json',
62            'secret_hash': 'string',
63        },
64    )
65    return self._tokens_pipe

Return the internal pipe for tokens management.

def register_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
68def register_token(
69    self,
70    token: Token,
71    debug: bool = False,
72) -> mrsm.SuccessTuple:
73    """
74    Register the new token to the tokens table.
75    """
76    token_id, token_secret = token.generate_credentials()
77    tokens_pipe = self.get_tokens_pipe()
78    user_id = self.get_user_id(token.user) if token.user is not None else None
79    if user_id is None:
80        return False, "Cannot register a token without a user."
81
82    doc = {
83        'id': token_id,
84        'user_id': user_id,
85        'creation': datetime.now(timezone.utc),
86        'expiration': token.expiration,
87        'label': token.label,
88        'is_valid': token.is_valid,
89        'scopes': list(token.scopes) if token.scopes else [],
90        'secret_hash': hash_password(
91            str(token_secret),
92            rounds=STATIC_CONFIG['tokens']['hash_rounds']
93        ),
94    }
95    sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug)
96    if not sync_success:
97        return False, f"Failed to register token:\n{sync_msg}"
98    return True, "Success"

Register the new token to the tokens table.

def edit_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
102    """
103    Persist the token's in-memory state to the tokens pipe.
104    """
105    if not token.id:
106        return False, "Token ID is not set."
107
108    if not token.exists(debug=debug):
109        return False, f"Token {token.id} does not exist."
110
111    if not token.creation:
112        token_model = self.get_token_model(token.id)
113        token.creation = token_model.creation
114
115    tokens_pipe = self.get_tokens_pipe()
116    doc = {
117        'id': token.id,
118        'creation': token.creation,
119        'expiration': token.expiration,
120        'label': token.label,
121        'is_valid': token.is_valid,
122        'scopes': list(token.scopes) if token.scopes else [],
123    }
124    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
125    if not sync_success:
126        return False, f"Failed to edit token '{token.id}':\n{sync_msg}"
127
128    return True, "Success"

Persist the token's in-memory state to the tokens pipe.

def invalidate_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
132    """
133    Set `is_valid` to `False` for the given token.
134    """
135    if not token.id:
136        return False, "Token ID is not set."
137
138    if not token.exists(debug=debug):
139        return False, f"Token {token.id} does not exist."
140
141    if not token.creation:
142        token_model = self.get_token_model(token.id)
143        token.creation = token_model.creation
144
145    token.is_valid = False
146    tokens_pipe = self.get_tokens_pipe()
147    doc = {
148        'id': token.id,
149        'creation': token.creation,
150        'is_valid': False,
151    }
152    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
153    if not sync_success:
154        return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}"
155
156    return True, "Success"

Set is_valid to False for the given token.

def delete_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
160    """
161    Delete the given token from the tokens table.
162    """
163    if not token.id:
164        return False, "Token ID is not set."
165
166    if not token.exists(debug=debug):
167        return False, f"Token {token.id} does not exist."
168
169    if not token.creation:
170        token_model = self.get_token_model(token.id)
171        token.creation = token_model.creation
172
173    token.is_valid = False
174    tokens_pipe = self.get_tokens_pipe()
175    clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug)
176    if not clear_success:
177        return False, f"Failed to delete token '{token.id}':\n{clear_msg}"
178
179    return True, "Success"

Delete the given token from the tokens table.

def get_token( self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Optional[meerschaum.core.Token._Token.Token]:
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]:
236    """
237    Return the `Token` from its ID.
238    """
239    from meerschaum.utils.misc import is_uuid
240    if isinstance(token_id, str):
241        if is_uuid(token_id):
242            token_id = uuid.UUID(token_id)
243        else:
244            raise ValueError("Invalid token ID.")
245    token_model = self.get_token_model(token_id)
246    if token_model is None:
247        return None
248    return Token(**dict(token_model))

Return the Token from its ID.

def get_tokens( self, user: Optional[meerschaum.core.User._User.User] = None, labels: Optional[List[str]] = None, ids: Optional[List[uuid.UUID]] = None, debug: bool = False) -> List[meerschaum.core.Token._Token.Token]:
182def get_tokens(
183    self,
184    user: Optional[User] = None,
185    labels: Optional[List[str]] = None,
186    ids: Optional[List[uuid.UUID]] = None,
187    debug: bool = False,
188) -> List[Token]:
189    """
190    Return a list of `Token` objects.
191    """
192    tokens_pipe = self.get_tokens_pipe()
193    user_id = (
194        self.get_user_id(user, debug=debug)
195        if user is not None
196        else None
197    )
198    user_type = self.get_user_type(user, debug=debug) if user is not None else None
199    params = (
200        {
201            'user_id': (
202                user_id
203                if user_type != 'admin'
204                else [user_id, None]
205            )
206        }
207        if user_id is not None
208        else {}
209    )
210    if labels:
211        params['label'] = labels
212    if ids:
213        params['id'] = ids
214        
215    if debug:
216        dprint(f"Getting tokens with {user_id=}, {params=}")
217
218    tokens_df = tokens_pipe.get_data(params=params, debug=debug)
219    if tokens_df is None:
220        return []
221
222    if debug:
223        dprint(f"Retrieved tokens dataframe:\n{tokens_df}")
224
225    tokens_docs = tokens_df.to_dict(orient='records')
226    return [
227        Token(
228            instance=self,
229            **token_doc
230        )
231        for token_doc in reversed(tokens_docs)
232    ]

Return a list of Token objects.

def get_token_model( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> "'Union[TokenModel, None]'":
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]':
252    """
253    Return a token's model from the instance.
254    """
255    from meerschaum.models import TokenModel
256    if isinstance(token_id, Token):
257        token_id = Token.id
258    if not token_id:
259        raise ValueError("Invalid token ID.")
260    tokens_pipe = self.get_tokens_pipe()
261    doc = tokens_pipe.get_doc(
262        params={'id': token_id},
263        debug=debug,
264    )
265    if doc is None:
266        return None
267    return TokenModel(**doc)

Return a token's model from the instance.

def get_token_secret_hash( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> Optional[str]:
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]:
271    """
272    Return the secret hash for a given token.
273    """
274    if isinstance(token_id, Token):
275        token_id = token_id.id
276    if not token_id:
277        raise ValueError("Invalid token ID.")
278    tokens_pipe = self.get_tokens_pipe()
279    return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)

Return the secret hash for a given token.

def token_exists( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> bool:
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool:
309    """
310    Return `True` if a token exists in the tokens pipe.
311    """
312    if isinstance(token_id, Token):
313        token_id = token_id.id
314    if not token_id:
315        raise ValueError("Invalid token ID.")
316
317    tokens_pipe = self.get_tokens_pipe()
318    return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None

Return True if a token exists in the tokens pipe.

def get_token_scopes( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> List[str]:
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]:
296    """
297    Return the scopes for a token.
298    """
299    if isinstance(token_id, Token):
300        token_id = token_id.id
301    if not token_id:
302        raise ValueError("Invalid token ID.")
303
304    tokens_pipe = self.get_tokens_pipe()
305    return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []

Return the scopes for a token.

@abc.abstractmethod
def register_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
17@abc.abstractmethod
18def register_pipe(
19    self,
20    pipe: mrsm.Pipe,
21    debug: bool = False,
22    **kwargs: Any
23) -> mrsm.SuccessTuple:
24    """
25    Insert the pipe's attributes into the internal `pipes` table.
26
27    Parameters
28    ----------
29    pipe: mrsm.Pipe
30        The pipe to be registered.
31
32    Returns
33    -------
34    A `SuccessTuple` of the result.
35    """

Insert the pipe's attributes into the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe to be registered.
Returns
@abc.abstractmethod
def get_pipe_attributes( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, Any]:
37@abc.abstractmethod
38def get_pipe_attributes(
39    self,
40    pipe: mrsm.Pipe,
41    debug: bool = False,
42    **kwargs: Any
43) -> Dict[str, Any]:
44    """
45    Return the pipe's document from the internal `pipes` table.
46
47    Parameters
48    ----------
49    pipe: mrsm.Pipe
50        The pipe whose attributes should be retrieved.
51
52    Returns
53    -------
54    The document that matches the keys of the pipe.
55    """

Return the pipe's document from the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
  • The document that matches the keys of the pipe.
@abc.abstractmethod
def get_pipe_id( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Union[str, int, NoneType]:
57@abc.abstractmethod
58def get_pipe_id(
59    self,
60    pipe: mrsm.Pipe,
61    debug: bool = False,
62    **kwargs: Any
63) -> Union[str, int, None]:
64    """
65    Return the `id` for the pipe if it exists.
66
67    Parameters
68    ----------
69    pipe: mrsm.Pipe
70        The pipe whose `id` to fetch.
71
72    Returns
73    -------
74    The `id` for the pipe's document or `None`.
75    """

Return the id for the pipe if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe whose id to fetch.
Returns
  • The id for the pipe's document or None.
def edit_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
77def edit_pipe(
78    self,
79    pipe: mrsm.Pipe,
80    debug: bool = False,
81    **kwargs: Any
82) -> mrsm.SuccessTuple:
83    """
84    Edit the attributes of the pipe.
85
86    Parameters
87    ----------
88    pipe: mrsm.Pipe
89        The pipe whose in-memory parameters must be persisted.
90
91    Returns
92    -------
93    A `SuccessTuple` indicating success.
94    """
95    raise NotImplementedError

Edit the attributes of the pipe.

Parameters
  • pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
def delete_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 97def delete_pipe(
 98    self,
 99    pipe: mrsm.Pipe,
100    debug: bool = False,
101    **kwargs: Any
102) -> mrsm.SuccessTuple:
103    """
104    Delete a pipe's registration from the `pipes` collection.
105
106    Parameters
107    ----------
108    pipe: mrsm.Pipe
109        The pipe to be deleted.
110
111    Returns
112    -------
113    A `SuccessTuple` indicating success.
114    """
115    raise NotImplementedError

Delete a pipe's registration from the pipes collection.

Parameters
  • pipe (mrsm.Pipe): The pipe to be deleted.
Returns
@abc.abstractmethod
def fetch_pipes_keys( self, connector_keys: Optional[List[str]] = None, metric_keys: Optional[List[str]] = None, location_keys: Optional[List[str]] = None, tags: Optional[List[str]] = None, debug: bool = False, **kwargs: Any) -> Union[List[Tuple[str, str, str]], List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], Dict[Union[int, str], Tuple[str, str, str]], Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]]]:
117@abc.abstractmethod
118def fetch_pipes_keys(
119    self,
120    connector_keys: Optional[List[str]] = None,
121    metric_keys: Optional[List[str]] = None,
122    location_keys: Optional[List[str]] = None,
123    tags: Optional[List[str]] = None,
124    debug: bool = False,
125    **kwargs: Any
126) -> Union[
127    List[Tuple[str, str, str]],
128    List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]],
129    Dict[Union[int, str], Tuple[str, str, str]],
130    Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]],
131]:
132    """
133    Return registered pipes' keys according to the provided filters.
134
135    May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples.
136    When returning a dictionary, the key is the pipe's unique ID (int or str).
137    Tuples may be length 3 `(connector_keys, metric_key, location_key)` or length 4
138    with parameters or tags appended as the fourth element.
139
140    Parameters
141    ----------
142    connector_keys: list[str] | None, default None
143        The keys passed via `-c`.
144
145    metric_keys: list[str] | None, default None
146        The keys passed via `-m`.
147
148    location_keys: list[str] | None, default None
149        The keys passed via `-l`.
150
151    tags: List[str] | None, default None
152        Tags passed via `--tags` which are stored under `parameters:tags`.
153
154    Returns
155    -------
156    A list of tuples or a dictionary mapping pipe IDs to tuples.
157    You may return the string `"None"` for location keys in place of nulls.
158
159    Examples
160    --------
161    >>> import meerschaum as mrsm
162    >>> conn = mrsm.get_connector('example:demo')
163    >>>
164    >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
165    >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
166    >>> pipe_a.register()
167    >>> pipe_b.register()
168    >>>
169    >>> conn.fetch_pipes_keys(['a', 'b'])
170    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
171    >>> conn.fetch_pipes_keys(metric_keys=['demo'])
172    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
173    >>> conn.fetch_pipes_keys(tags=['foo'])
174    [('a', 'demo', 'None')]
175    >>> conn.fetch_pipes_keys(location_keys=[None])
176    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
177    """

Return registered pipes' keys according to the provided filters.

May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples. When returning a dictionary, the key is the pipe's unique ID (int or str). Tuples may be length 3 (connector_keys, metric_key, location_key) or length 4 with parameters or tags appended as the fourth element.

Parameters
  • connector_keys (list[str] | None, default None): The keys passed via -c.
  • metric_keys (list[str] | None, default None): The keys passed via -m.
  • location_keys (list[str] | None, default None): The keys passed via -l.
  • tags (List[str] | None, default None): Tags passed via --tags which are stored under parameters:tags.
Returns
  • A list of tuples or a dictionary mapping pipe IDs to tuples.
  • You may return the string "None" for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
@abc.abstractmethod
def pipe_exists( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> bool:
179@abc.abstractmethod
180def pipe_exists(
181    self,
182    pipe: mrsm.Pipe,
183    debug: bool = False,
184    **kwargs: Any
185) -> bool:
186    """
187    Check whether a pipe's target table exists.
188
189    Parameters
190    ----------
191    pipe: mrsm.Pipe
192        The pipe to check whether its table exists.
193
194    Returns
195    -------
196    A `bool` indicating the table exists.
197    """

Check whether a pipe's target table exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
  • A bool indicating the table exists.
@abc.abstractmethod
def drop_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
199@abc.abstractmethod
200def drop_pipe(
201    self,
202    pipe: mrsm.Pipe,
203    debug: bool = False,
204    **kwargs: Any
205) -> mrsm.SuccessTuple:
206    """
207    Drop a pipe's collection if it exists.
208
209    Parameters
210    ----------
211    pipe: mrsm.Pipe
212        The pipe to be dropped.
213
214    Returns
215    -------
216    A `SuccessTuple` indicating success.
217    """
218    raise NotImplementedError

Drop a pipe's collection if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to be dropped.
Returns
def drop_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
220def drop_pipe_indices(
221    self,
222    pipe: mrsm.Pipe,
223    debug: bool = False,
224    **kwargs: Any
225) -> mrsm.SuccessTuple:
226    """
227    Drop a pipe's indices.
228
229    Parameters
230    ----------
231    pipe: mrsm.Pipe
232        The pipe whose indices need to be dropped.
233
234    Returns
235    -------
236    A `SuccessTuple` indicating success.
237    """
238    return False, f"Cannot drop indices for instance connectors of type '{self.type}'."

Drop a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
@abc.abstractmethod
def sync_pipe( self, pipe: Pipe, df: "'pd.DataFrame'" = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, chunksize: Optional[int] = -1, check_existing: bool = True, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
240@abc.abstractmethod
241def sync_pipe(
242    self,
243    pipe: mrsm.Pipe,
244    df: 'pd.DataFrame' = None,
245    begin: Union[datetime, int, None] = None,
246    end: Union[datetime, int, None] = None,
247    chunksize: Optional[int] = -1,
248    check_existing: bool = True,
249    debug: bool = False,
250    **kwargs: Any
251) -> mrsm.SuccessTuple:
252    """
253    Sync a pipe using a database connection.
254
255    Parameters
256    ----------
257    pipe: mrsm.Pipe
258        The Meerschaum Pipe instance into which to sync the data.
259
260    df: Optional[pd.DataFrame]
261        An optional DataFrame or equivalent to sync into the pipe.
262        Defaults to `None`.
263
264    begin: Union[datetime, int, None], default None
265        Optionally specify the earliest datetime to search for data.
266        Defaults to `None`.
267
268    end: Union[datetime, int, None], default None
269        Optionally specify the latest datetime to search for data.
270        Defaults to `None`.
271
272    chunksize: Optional[int], default -1
273        Specify the number of rows to sync per chunk.
274        If `-1`, resort to system configuration (default is `900`).
275        A `chunksize` of `None` will sync all rows in one transaction.
276        Defaults to `-1`.
277
278    check_existing: bool, default True
279        If `True`, pull and diff with existing data from the pipe. Defaults to `True`.
280
281    debug: bool, default False
282        Verbosity toggle. Defaults to False.
283
284    Returns
285    -------
286    A `SuccessTuple` of success (`bool`) and message (`str`).
287    """

Sync a pipe using a database connection.

Parameters
  • pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
  • df (Optional[pd.DataFrame]): An optional DataFrame or equivalent to sync into the pipe. Defaults to None.
  • begin (Union[datetime, int, None], default None): Optionally specify the earliest datetime to search for data. Defaults to None.
  • end (Union[datetime, int, None], default None): Optionally specify the latest datetime to search for data. Defaults to None.
  • chunksize (Optional[int], default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction. Defaults to -1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe. Defaults to True.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def create_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
289def create_pipe_indices(
290    self,
291    pipe: mrsm.Pipe,
292    debug: bool = False,
293    **kwargs: Any
294) -> mrsm.SuccessTuple:
295    """
296    Create a pipe's indices.
297
298    Parameters
299    ----------
300    pipe: mrsm.Pipe
301        The pipe whose indices need to be created.
302
303    Returns
304    -------
305    A `SuccessTuple` indicating success.
306    """
307    return False, f"Cannot create indices for instance connectors of type '{self.type}'."

Create a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
def clear_pipe( self, pipe: Pipe, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
309def clear_pipe(
310    self,
311    pipe: mrsm.Pipe,
312    begin: Union[datetime, int, None] = None,
313    end: Union[datetime, int, None] = None,
314    params: Optional[Dict[str, Any]] = None,
315    debug: bool = False,
316    **kwargs: Any
317) -> mrsm.SuccessTuple:
318    """
319    Delete rows within `begin`, `end`, and `params`.
320
321    Parameters
322    ----------
323    pipe: mrsm.Pipe
324        The pipe whose rows to clear.
325
326    begin: datetime | int | None, default None
327        If provided, remove rows >= `begin`.
328
329    end: datetime | int | None, default None
330        If provided, remove rows < `end`.
331
332    params: dict[str, Any] | None, default None
333        If provided, only remove rows which match the `params` filter.
334
335    Returns
336    -------
337    A `SuccessTuple` indicating success.
338    """
339    raise NotImplementedError

Delete rows within begin, end, and params.

Parameters
  • pipe (mrsm.Pipe): The pipe whose rows to clear.
  • begin (datetime | int | None, default None): If provided, remove rows >= begin.
  • end (datetime | int | None, default None): If provided, remove rows < end.
  • params (dict[str, Any] | None, default None): If provided, only remove rows which match the params filter.
Returns
def get_pipe_data( self, pipe: Pipe, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> "Union['pd.DataFrame', None]":
341def get_pipe_data(
342    self,
343    pipe: mrsm.Pipe,
344    select_columns: Optional[List[str]] = None,
345    omit_columns: Optional[List[str]] = None,
346    begin: Union[datetime, int, None] = None,
347    end: Union[datetime, int, None] = None,
348    params: Optional[Dict[str, Any]] = None,
349    debug: bool = False,
350    **kwargs: Any
351) -> Union['pd.DataFrame', None]:
352    """
353    Query a pipe's target table and return the DataFrame.
354
355    Parameters
356    ----------
357    pipe: mrsm.Pipe
358        The pipe with the target table from which to read.
359
360    select_columns: list[str] | None, default None
361        If provided, only select these given columns.
362        Otherwise select all available columns (i.e. `SELECT *`).
363
364    omit_columns: list[str] | None, default None
365        If provided, remove these columns from the selection.
366
367    begin: datetime | int | None, default None
368        The earliest `datetime` value to search from (inclusive).
369
370    end: datetime | int | None, default None
371        The lastest `datetime` value to search from (exclusive).
372
373    params: dict[str | str] | None, default None
374        Additional filters to apply to the query.
375
376    Returns
377    -------
378    The target table's data as a DataFrame.
379    """
380    if type(self).get_pipe_docs is get_pipe_docs:
381        raise NotImplementedError(
382            f"Missing `get_pipe_data()` or `get_pipe_docs()` for {type(self)}."
383        )
384
385    docs = self.get_pipe_docs(
386        pipe=pipe,
387        select_columns=select_columns,
388        omit_columns=omit_columns,
389        begin=begin,
390        end=end,
391        params=params,
392        debug=debug,
393        **kwargs
394    )
395    if not docs:
396        return None
397
398    pd = mrsm.attempt_import('pandas')
399    try:
400        return pd.DataFrame(docs)
401    except Exception as e:
402        from meerschaum.utils.warnings import warn
403        warn(f"Cannot build DataFrame from pipe docs:\n{e}")
404    
405    return None

Query a pipe's target table and return the DataFrame.

Parameters
  • pipe (mrsm.Pipe): The pipe with the target table from which to read.
  • select_columns (list[str] | None, default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
  • begin (datetime | int | None, default None): The earliest datetime value to search from (inclusive).
  • end (datetime | int | None, default None): The lastest datetime value to search from (exclusive).
  • params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
  • The target table's data as a DataFrame.
def get_pipe_docs( self, pipe: Pipe, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> list[dict[str, typing.Any]]:
407def get_pipe_docs(
408    self,
409    pipe: mrsm.Pipe,
410    select_columns: Optional[List[str]] = None,
411    omit_columns: Optional[List[str]] = None,
412    begin: Union[datetime, int, None] = None,
413    end: Union[datetime, int, None] = None,
414    params: Optional[Dict[str, Any]] = None,
415    debug: bool = False,
416    **kwargs: Any
417) -> list[dict[str, Any]]:
418    """
419    Return a pipe's data as a list of documents.
420    Defaults to `get_pipe_data().to_dict(orient='records')`.
421
422    Parameters
423    ----------
424    pipe: mrsm.Pipe
425        The pipe with the target table from which to read.
426
427    select_columns: list[str] | None, default None
428        If provided, only select these given columns.
429        Otherwise select all available columns (i.e. `SELECT *`).
430
431    omit_columns: list[str] | None, default None
432        If provided, remove these columns from the selection.
433
434    begin: datetime | int | None, default None
435        The earliest `datetime` value to search from (inclusive).
436
437    end: datetime | int | None, default None
438        The lastest `datetime` value to search from (exclusive).
439
440    params: dict[str | str] | None, default None
441        Additional filters to apply to the query.
442
443    Returns
444    -------
445    The target table's data as a list of dictionaries.
446    """
447    df = self.get_pipe_data(
448        pipe=pipe,
449        select_columns=select_columns,
450        omit_columns=omit_columns,
451        begin=begin,
452        end=end,
453        params=params,
454        debug=debug,
455        **kwargs
456    )
457    if df is None or df.empty:
458        return []
459    return df.to_dict(orient='records')

Return a pipe's data as a list of documents. Defaults to get_pipe_data().to_dict(orient='records').

Parameters
  • pipe (mrsm.Pipe): The pipe with the target table from which to read.
  • select_columns (list[str] | None, default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
  • begin (datetime | int | None, default None): The earliest datetime value to search from (inclusive).
  • end (datetime | int | None, default None): The lastest datetime value to search from (exclusive).
  • params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
  • The target table's data as a list of dictionaries.
@abc.abstractmethod
def get_sync_time( self, pipe: Pipe, params: Optional[Dict[str, Any]] = None, newest: bool = True, debug: bool = False, **kwargs: Any) -> datetime.datetime | int | None:
461@abc.abstractmethod
462def get_sync_time(
463    self,
464    pipe: mrsm.Pipe,
465    params: Optional[Dict[str, Any]] = None,
466    newest: bool = True,
467    debug: bool = False,
468    **kwargs: Any
469) -> datetime | int | None:
470    """
471    Return the most recent value for the `datetime` axis.
472
473    Parameters
474    ----------
475    pipe: mrsm.Pipe
476        The pipe whose collection contains documents.
477
478    params: dict[str, Any] | None, default None
479        Filter certain parameters when determining the sync time.
480
481    newest: bool, default True
482        If `True`, return the maximum value for the column.
483
484    Returns
485    -------
486    The largest `datetime` or `int` value of the `datetime` axis. 
487    """

Return the most recent value for the datetime axis.

Parameters
  • pipe (mrsm.Pipe): The pipe whose collection contains documents.
  • params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
  • newest (bool, default True): If True, return the maximum value for the column.
Returns
  • The largest datetime or int value of the datetime axis.
@abc.abstractmethod
def get_pipe_columns_types( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, str]:
489@abc.abstractmethod
490def get_pipe_columns_types(
491    self,
492    pipe: mrsm.Pipe,
493    debug: bool = False,
494    **kwargs: Any
495) -> Dict[str, str]:
496    """
497    Return the data types for the columns in the target table for data type enforcement.
498
499    Parameters
500    ----------
501    pipe: mrsm.Pipe
502        The pipe whose target table contains columns and data types.
503
504    Returns
505    -------
506    A dictionary mapping columns to data types.
507    """

Return the data types for the columns in the target table for data type enforcement.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
  • A dictionary mapping columns to data types.
def get_pipe_columns_indices(self, debug: bool = False) -> Dict[str, List[Dict[str, str]]]:
509def get_pipe_columns_indices(
510    self,
511    debug: bool = False,
512) -> Dict[str, List[Dict[str, str]]]:
513    """
514    Return a dictionary mapping columns to metadata about related indices.
515
516    Parameters
517    ----------
518    pipe: mrsm.Pipe
519        The pipe whose target table has related indices.
520
521    Returns
522    -------
523    A list of dictionaries with the keys "type" and "name".
524
525    Examples
526    --------
527    >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
528    >>> pipe.sync([{'color': 'red', 'size': 'M'}])
529    >>> pipe.get_columns_indices()
530    {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
531    """
532    return {}

Return a dictionary mapping columns to metadata about related indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
  • A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
def make_connector(cls, _is_executor: bool = False):
279def make_connector(cls, _is_executor: bool = False):
280    """
281    Register a class as a `Connector`.
282    The `type` will be the lower case of the class name, without the suffix `connector`.
283
284    Parameters
285    ----------
286    instance: bool, default False
287        If `True`, make this connector type an instance connector.
288        This requires implementing the various pipes functions and lots of testing.
289
290    Examples
291    --------
292    >>> import meerschaum as mrsm
293    >>> from meerschaum.connectors import make_connector, Connector
294    >>> 
295    >>> @make_connector
296    >>> class FooConnector(Connector):
297    ...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
298    ... 
299    >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
300    >>> print(conn.username, conn.password)
301    dog cat
302    >>> 
303    """
304    import re
305    from meerschaum.plugins import _get_parent_plugin
306    suffix_regex = (
307        r'connector$'
308        if not _is_executor
309        else r'executor$'
310    )
311    plugin_name = _get_parent_plugin(2)
312    typ = re.sub(suffix_regex, '', cls.__name__.lower())
313    with _locks['types']:
314        types[typ] = cls
315    with _locks['custom_types']:
316        custom_types.add(typ)
317    if plugin_name:
318        with _locks['plugins_types']:
319            if plugin_name not in plugins_types:
320                plugins_types[plugin_name] = []
321            plugins_types[plugin_name].append(typ)
322    with _locks['connectors']:
323        if typ not in connectors:
324            connectors[typ] = {}
325    if getattr(cls, 'IS_INSTANCE', False):
326        with _locks['instance_types']:
327            if typ not in instance_types:
328                instance_types.append(typ)
329
330    return cls

Register a class as a Connector. The type will be the lower case of the class name, without the suffix connector.

Parameters
  • instance (bool, default False): If True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>> 
>>> @make_connector
>>> class FooConnector(Connector):
...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
... 
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
def entry( sysargs: Union[List[str], str, NoneType] = None, _patch_args: Optional[Dict[str, Any]] = None, _use_cli_daemon: bool = True, _session_id: Optional[str] = None) -> Tuple[bool, str]:
53def entry(
54    sysargs: Union[List[str], str, None] = None,
55    _patch_args: Optional[Dict[str, Any]] = None,
56    _use_cli_daemon: bool = True,
57    _session_id: Optional[str] = None,
58) -> SuccessTuple:
59    """
60    Parse arguments and launch a Meerschaum action.
61
62    Returns
63    -------
64    A `SuccessTuple` indicating success.
65    """
66    start = time.perf_counter()
67    from meerschaum.config.environment import get_daemon_env_vars
68    sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs
69    if (
70        not _use_cli_daemon
71        or (not sysargs or (sysargs[0] and sysargs[0].startswith('-')))
72        or '--no-daemon' in sysargs_list
73        or '--daemon' in sysargs_list
74        or '-d' in sysargs_list
75        or get_daemon_env_vars()
76        or not mrsm.get_config('system', 'experimental', 'cli_daemon')
77    ):
78        success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args)
79        end = time.perf_counter()
80        if '--debug' in sysargs_list:
81            print(f"Duration without daemon: {round(end - start, 3)}")
82        return success, msg
83
84    from meerschaum._internal.cli.entry import entry_with_daemon
85    success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args)
86    end = time.perf_counter()
87    if '--debug' in sysargs_list:
88        print(f"Duration with daemon: {round(end - start, 3)}")
89    return success, msg

Parse arguments and launch a Meerschaum action.

Returns