meerschaum

Meerschaum banner

Meerschaum Python API

Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.

Root Module

For your convenience, the following classes and functions may be imported from the root meerschaum namespace:

Examples

Build a Connector

Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:

import meerschaum as mrsm

sql_conn = mrsm.get_connector(
    'sql:temp',
    flavor='sqlite',
    database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
#    foo
# 0    1

sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
#    foo
# 0    1

Create a Custom Connector Class

Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:

from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time

@mrsm.make_connector
class FooConnector(mrsm.Connector):
    REQUIRED_ATTRIBUTES = ['username', 'password']

    def fetch(
        self,
        begin: datetime | None = None,
        end: datetime | None = None,
    ):
        now = begin or round_time(datetime.now(timezone.utc))
        return [
            {'ts': now, 'id': 1, 'vl': randint(1, 100)},
            {'ts': now, 'id': 2, 'vl': randint(1, 100)},
            {'ts': now, 'id': 3, 'vl': randint(1, 100)},
        ]

foo_conn = mrsm.get_connector(
    'foo:bar',
    username='foo',
    password='bar',
)
docs = foo_conn.fetch()

Build a Pipe

Build a meerschaum.Pipe in-memory:

from datetime import datetime
import meerschaum as mrsm

pipe = mrsm.Pipe(
    foo_conn, 'demo',
    instance=sql_conn,
    columns={'datetime': 'ts', 'id': 'id'},
    tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
#           ts  id  vl
# 0 2024-01-01   1  97
# 1 2024-01-01   2  18
# 2 2024-01-01   3  96

Add temporary=True to skip registering the pipe in the pipes table.

Get Registered Pipes

The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:

import meerschaum as mrsm

pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]

Add as_list=True to flatten the hierarchy:

import meerschaum as mrsm

pipes = mrsm.get_pipes(
    tags=['production'],
    instance=sql_conn,
    as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]

Import Plugins

You can import a plugin's module through meerschaum.Plugin.module:

import meerschaum as mrsm

plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
    noaa = plugin.module

If your plugin has submodules, use meerschaum.plugins.from_plugin_import:

from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')

Import multiple plugins with meerschaum.plugins.import_plugins:

from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')

Create a Job

Create a meerschaum.Job with name and sysargs:

import meerschaum as mrsm

job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()

Pass executor_keys as the connectors keys of an API instance to create a remote job:

import meerschaum as mrsm

job = mrsm.Job(
    'foo',
    'sync pipes -s daily',
    executor_keys='api:main',
)

Import from a Virtual Environment Use the meerschaum.Venv context manager to activate a virtual environment:

import meerschaum as mrsm

with mrsm.Venv('noaa'):
    import requests

print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

To import packages which may not be installed, use meerschaum.attempt_import():

import meerschaum as mrsm

requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

Run Actions

Run sysargs with meerschaum.entry():

import meerschaum as mrsm

success, msg = mrsm.entry('show pipes + show version : x2')

Use meerschaum.actions.get_action() to access an action function directly:

from meerschaum.actions import get_action

show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])

Get a dictionary of available subactions with meerschaum.actions.get_subactions():

from meerschaum.actions import get_subactions

subactions = get_subactions('show')
success, msg = subactions['pipes']()

Create a Plugin

Run bootstrap plugin to create a new plugin:

mrsm bootstrap plugin example

This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.

Open your plugin with edit plugin:

mrsm edit plugin example

Run edit plugin and paste the example code below to try out the features.

See the writing plugins guide for more in-depth documentation.

Create a Custom Action

Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:

from meerschaum.actions import make_action

@make_action
def sing():
    print('What would you like me to sing?')
    return True, "Success"

def sing_tune():
    return False, "I don't know that song!"

def sing_song():
    print('Hello, World!')
    return True, "Success"

Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:

from meerschaum.plugins import make_action, add_plugin_argument

add_plugin_argument(
    '--song', type=str, help='What song to sing.',
)

@make_action
def sing_melody(action=None, song=None):
    to_sing = action[0] if action else song
    if not to_sing:
        return False, "Please tell me what to sing!"

    return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala

mrsm sing melody --song do-re-mi

Add a Page to the Web Dashboard Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:

from meerschaum.plugins import dash_plugin, web_page

@dash_plugin
def init_dash(dash_app):

    import dash.html as html
    import dash_bootstrap_components as dbc
    from dash import Input, Output, no_update

    ### Routes to '/dash/my-page'
    @web_page('/my-page', login_required=False)
    def my_page():
        return dbc.Container([
            html.H1("Hello, World!"),
            dbc.Button("Click me", id='my-button'),
            html.Div(id="my-output-div"),
        ])

    @dash_app.callback(
        Output('my-output-div', 'children'),
        Input('my-button', 'n_clicks'),
    )
    def my_button_click(n_clicks):
        if not n_clicks:
            return no_update
        return html.P(f'You clicked {n_clicks} times!')

Submodules

meerschaum.actions
Access functions for actions and subactions.

meerschaum.config
Read and write the Meerschaum configuration registry.

meerschaum.connectors
Build connectors to interact with databases and fetch data.

meerschaum.jobs
Start background jobs.

meerschaum.plugins
Access plugin modules and other API utilties.

meerschaum.utils
Utility functions are available in several submodules:

 1#! /usr/bin/env python
 2# -*- coding: utf-8 -*-
 3# vim:fenc=utf-8
 4
 5"""
 6Copyright 2025 Bennett Meares
 7
 8Licensed under the Apache License, Version 2.0 (the "License");
 9you may not use this file except in compliance with the License.
10You may obtain a copy of the License at
11
12   http://www.apache.org/licenses/LICENSE-2.0
13
14Unless required by applicable law or agreed to in writing, software
15distributed under the License is distributed on an "AS IS" BASIS,
16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17See the License for the specific language governing permissions and
18limitations under the License.
19"""
20
21import atexit
22
23from meerschaum.utils.typing import SuccessTuple
24from meerschaum.utils.packages import attempt_import
25from meerschaum.core.Pipe import Pipe
26from meerschaum.plugins import Plugin
27from meerschaum.utils.venv import Venv
28from meerschaum.jobs import Job, make_executor
29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector
30from meerschaum.utils import get_pipes
31from meerschaum.utils.formatting import pprint
32from meerschaum._internal.docs import index as __doc__
33from meerschaum.config import __version__, get_config
34from meerschaum._internal.entry import entry
35from meerschaum.__main__ import _close_pools
36
37atexit.register(_close_pools)
38
39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False}
40__all__ = (
41    "get_pipes",
42    "get_connector",
43    "get_config",
44    "Pipe",
45    "Plugin",
46    "SuccessTuple",
47    "Venv",
48    "Plugin",
49    "Job",
50    "pprint",
51    "attempt_import",
52    "actions",
53    "config",
54    "connectors",
55    "jobs",
56    "plugins",
57    "utils",
58    "SuccessTuple",
59    "Connector",
60    "InstanceConnector",
61    "make_connector",
62    "entry",
63)
def get_pipes( connector_keys: Union[str, List[str], NoneType] = None, metric_keys: Union[str, List[str], NoneType] = None, location_keys: Union[str, List[str], NoneType] = None, tags: Optional[List[str]] = None, params: Optional[Dict[str, Any]] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, instance: Union[str, InstanceConnector, NoneType] = None, as_list: bool = False, as_tags_dict: bool = False, method: str = 'registered', workers: Optional[int] = None, debug: bool = False, _cache_parameters: bool = True, **kw: Any) -> Union[Dict[str, Dict[str, Dict[str, Pipe]]], List[Pipe], Dict[str, Pipe]]:
 29def get_pipes(
 30    connector_keys: Union[str, List[str], None] = None,
 31    metric_keys: Union[str, List[str], None] = None,
 32    location_keys: Union[str, List[str], None] = None,
 33    tags: Optional[List[str]] = None,
 34    params: Optional[Dict[str, Any]] = None,
 35    mrsm_instance: Union[str, InstanceConnector, None] = None,
 36    instance: Union[str, InstanceConnector, None] = None,
 37    as_list: bool = False,
 38    as_tags_dict: bool = False,
 39    method: str = 'registered',
 40    workers: Optional[int] = None,
 41    debug: bool = False,
 42    _cache_parameters: bool = True,
 43    **kw: Any
 44) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]:
 45    """
 46    Return a dictionary or list of `meerschaum.Pipe` objects.
 47
 48    Parameters
 49    ----------
 50    connector_keys: Union[str, List[str], None], default None
 51        String or list of connector keys.
 52        If omitted or is `'*'`, fetch all possible keys.
 53        If a string begins with `'_'`, select keys that do NOT match the string.
 54
 55    metric_keys: Union[str, List[str], None], default None
 56        String or list of metric keys. See `connector_keys` for formatting.
 57
 58    location_keys: Union[str, List[str], None], default None
 59        String or list of location keys. See `connector_keys` for formatting.
 60
 61    tags: Optional[List[str]], default None
 62         If provided, only include pipes with these tags.
 63
 64    params: Optional[Dict[str, Any]], default None
 65        Dictionary of additional parameters to search by.
 66        Params are parsed into a SQL WHERE clause.
 67        E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'`
 68
 69    mrsm_instance: Union[str, InstanceConnector, None], default None
 70        Connector keys for the Meerschaum instance of the pipes.
 71        Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or
 72        `meerschaum.connectors.api.APIConnector.APIConnector`.
 73        
 74    as_list: bool, default False
 75        If `True`, return pipes in a list instead of a hierarchical dictionary.
 76        `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}`
 77        `True`  : `[Pipe]`
 78
 79    as_tags_dict: bool, default False
 80        If `True`, return a dictionary mapping tags to pipes.
 81        Pipes with multiple tags will be repeated.
 82
 83    method: str, default 'registered'
 84        Available options: `['registered', 'explicit', 'all']`
 85        If `'registered'` (default), create pipes based on registered keys in the connector's pipes table
 86        (API or SQL connector, depends on mrsm_instance).
 87        If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys
 88        instead of consulting the pipes table. Useful for creating non-existent pipes.
 89        If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`.
 90        **NOTE:** Method `'all'` is not implemented!
 91
 92    workers: Optional[int], default None
 93        If provided (and `as_tags_dict` is `True`), set the number of workers for the pool
 94        to fetch tags.
 95        Only takes effect if the instance connector supports multi-threading
 96
 97    **kw: Any:
 98        Keyword arguments to pass to the `meerschaum.Pipe` constructor.
 99
100    Returns
101    -------
102    A dictionary of dictionaries and `meerschaum.Pipe` objects
103    in the connector, metric, location hierarchy.
104    If `as_list` is `True`, return a list of `meerschaum.Pipe` objects.
105    If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes.
106
107    Examples
108    --------
109    ```
110    >>> ### Manual definition:
111    >>> pipes = {
112    ...     <connector_keys>: {
113    ...         <metric_key>: {
114    ...             <location_key>: Pipe(
115    ...                 <connector_keys>,
116    ...                 <metric_key>,
117    ...                 <location_key>,
118    ...             ),
119    ...         },
120    ...     },
121    ... },
122    >>> ### Accessing a single pipe:
123    >>> pipes['sql:main']['weather'][None]
124    >>> ### Return a list instead:
125    >>> get_pipes(as_list=True)
126    [Pipe('sql:main', 'weather')]
127    >>> get_pipes(as_tags_dict=True)
128    {'gvl': Pipe('sql:main', 'weather')}
129    ```
130    """
131
132    import json
133    from collections import defaultdict
134    from meerschaum.config import get_config
135    from meerschaum.utils.warnings import error
136    from meerschaum.utils.misc import filter_keywords
137    from meerschaum.utils.pool import get_pool
138
139    if connector_keys is None:
140        connector_keys = []
141    if metric_keys is None:
142        metric_keys = []
143    if location_keys is None:
144        location_keys = []
145    if params is None:
146        params = {}
147    if tags is None:
148        tags = []
149
150    if isinstance(connector_keys, str):
151        connector_keys = [connector_keys]
152    if isinstance(metric_keys, str):
153        metric_keys = [metric_keys]
154    if isinstance(location_keys, str):
155        location_keys = [location_keys]
156
157    ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`).
158    if mrsm_instance is None:
159        mrsm_instance = instance
160    if mrsm_instance is None:
161        mrsm_instance = get_config('meerschaum', 'instance', patch=True)
162    if isinstance(mrsm_instance, str):
163        from meerschaum.connectors.parse import parse_instance_keys
164        connector = parse_instance_keys(keys=mrsm_instance, debug=debug)
165    else:
166        from meerschaum.connectors import instance_types
167        valid_connector = False
168        if hasattr(mrsm_instance, 'type'):
169            if mrsm_instance.type in instance_types:
170                valid_connector = True
171        if not valid_connector:
172            error(f"Invalid instance connector: {mrsm_instance}")
173        connector = mrsm_instance
174    if debug:
175        from meerschaum.utils.debug import dprint
176        dprint(f"Using instance connector: {connector}")
177    if not connector:
178        error(f"Could not create connector from keys: '{mrsm_instance}'")
179
180    ### Get a list of tuples for the keys needed to build pipes.
181    result = fetch_pipes_keys(
182        method,
183        connector,
184        connector_keys = connector_keys,
185        metric_keys = metric_keys,
186        location_keys = location_keys,
187        tags = tags,
188        params = params,
189        workers = workers,
190        debug = debug
191    )
192    if result is None:
193        error("Unable to build pipes!")
194
195    ### Populate the `pipes` dictionary with Pipes based on the keys
196    ### obtained from the chosen `method`.
197    from meerschaum import Pipe
198    pipes = {}
199    for keys_tuple in result:
200        ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2]
201        pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None
202        pipe_parameters = (
203            pipe_tags_or_parameters
204            if isinstance(pipe_tags_or_parameters, (dict, str))
205            else None
206        )
207        if isinstance(pipe_parameters, str):
208            pipe_parameters = json.loads(pipe_parameters)
209        pipe_tags = (
210            pipe_tags_or_parameters
211            if isinstance(pipe_tags_or_parameters, list)
212            else (
213                pipe_tags_or_parameters.get('tags', [])
214                if isinstance(pipe_tags_or_parameters, dict)
215                else None
216            )
217        )
218
219        if ck not in pipes:
220            pipes[ck] = {}
221
222        if mk not in pipes[ck]:
223            pipes[ck][mk] = {}
224
225        pipe = Pipe(
226            ck, mk, lk,
227            mrsm_instance = connector,
228            parameters = pipe_parameters,
229            tags = pipe_tags,
230            debug = debug,
231            **filter_keywords(Pipe, **kw)
232        )
233        pipe.__dict__['_tags'] = pipe_tags
234        pipes[ck][mk][lk] = pipe
235
236    if not as_list and not as_tags_dict:
237        return pipes
238
239    from meerschaum.utils.misc import flatten_pipes_dict
240    pipes_list = flatten_pipes_dict(pipes)
241    if as_list:
242        return pipes_list
243
244    pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1))
245    def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]:
246        _tags = pipe.__dict__.get('_tags', None)
247        gathered_tags = _tags if _tags is not None else pipe.tags
248        return pipe, (gathered_tags or [])
249
250    tags_pipes = defaultdict(lambda: [])
251    pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list))
252    for pipe, tags in pipes_tags.items():
253        for tag in (tags or []):
254            tags_pipes[tag].append(pipe)
255
256    return dict(tags_pipes)

Return a dictionary or list of meerschaum.Pipe objects.

Parameters
  • connector_keys (Union[str, List[str], None], default None): String or list of connector keys. If omitted or is '*', fetch all possible keys. If a string begins with '_', select keys that do NOT match the string.
  • metric_keys (Union[str, List[str], None], default None): String or list of metric keys. See connector_keys for formatting.
  • location_keys (Union[str, List[str], None], default None): String or list of location keys. See connector_keys for formatting.
  • tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
  • params (Optional[Dict[str, Any]], default None): Dictionary of additional parameters to search by. Params are parsed into a SQL WHERE clause. E.g. {'a': 1, 'b': 2} equates to 'WHERE a = 1 AND b = 2'
  • mrsm_instance (Union[str, InstanceConnector, None], default None): Connector keys for the Meerschaum instance of the pipes. Must be a meerschaum.connectors.sql.SQLConnector.SQLConnector or meerschaum.connectors.api.APIConnector.APIConnector.
  • as_list (bool, default False): If True, return pipes in a list instead of a hierarchical dictionary. False : {connector_keys: {metric_key: {location_key: Pipe}}} True : [Pipe]
  • as_tags_dict (bool, default False): If True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated.
  • method (str, default 'registered'): Available options: ['registered', 'explicit', 'all'] If 'registered' (default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If 'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If 'all', create pipes from predefined metrics and locations. Required connector_keys. NOTE: Method 'all' is not implemented!
  • workers (Optional[int], default None): If provided (and as_tags_dict is True), set the number of workers for the pool to fetch tags. Only takes effect if the instance connector supports multi-threading
  • **kw (Any:): Keyword arguments to pass to the meerschaum.Pipe constructor.
Returns
  • A dictionary of dictionaries and meerschaum.Pipe objects
  • in the connector, metric, location hierarchy.
  • If as_list is True, return a list of meerschaum.Pipe objects.
  • If as_tags_dict is True, return a dictionary mapping tags to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
...     <connector_keys>: {
...         <metric_key>: {
...             <location_key>: Pipe(
...                 <connector_keys>,
...                 <metric_key>,
...                 <location_key>,
...             ),
...         },
...     },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
def get_connector( type: str = None, label: str = None, refresh: bool = False, debug: bool = False, _load_plugins: bool = True, **kw: Any) -> Connector:
 68def get_connector(
 69    type: str = None,
 70    label: str = None,
 71    refresh: bool = False,
 72    debug: bool = False,
 73    _load_plugins: bool = True,
 74    **kw: Any
 75) -> Connector:
 76    """
 77    Return existing connector or create new connection and store for reuse.
 78    
 79    You can create new connectors if enough parameters are provided for the given type and flavor.
 80
 81    Parameters
 82    ----------
 83    type: Optional[str], default None
 84        Connector type (sql, api, etc.).
 85        Defaults to the type of the configured `instance_connector`.
 86
 87    label: Optional[str], default None
 88        Connector label (e.g. main). Defaults to `'main'`.
 89
 90    refresh: bool, default False
 91        Refresh the Connector instance / construct new object. Defaults to `False`.
 92
 93    kw: Any
 94        Other arguments to pass to the Connector constructor.
 95        If the Connector has already been constructed and new arguments are provided,
 96        `refresh` is set to `True` and the old Connector is replaced.
 97
 98    Returns
 99    -------
100    A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`,
101    `meerschaum.connectors.sql.SQLConnector`).
102    
103    Examples
104    --------
105    The following parameters would create a new
106    `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file.
107
108    ```
109    >>> conn = get_connector(
110    ...     type = 'sql',
111    ...     label = 'newlabel',
112    ...     flavor = 'sqlite',
113    ...     database = '/file/path/to/database.db'
114    ... )
115    >>>
116    ```
117
118    """
119    from meerschaum.connectors.parse import parse_instance_keys
120    from meerschaum.config import get_config
121    from meerschaum._internal.static import STATIC_CONFIG
122    from meerschaum.utils.warnings import warn
123    global _loaded_plugin_connectors
124    if isinstance(type, str) and not label and ':' in type:
125        type, label = type.split(':', maxsplit=1)
126
127    if _load_plugins:
128        with _locks['_loaded_plugin_connectors']:
129            if not _loaded_plugin_connectors:
130                load_plugin_connectors()
131                _load_builtin_custom_connectors()
132                _loaded_plugin_connectors = True
133
134    if type is None and label is None:
135        default_instance_keys = get_config('meerschaum', 'instance', patch=True)
136        ### recursive call to get_connector
137        return parse_instance_keys(default_instance_keys)
138
139    ### NOTE: the default instance connector may not be main.
140    ### Only fall back to 'main' if the type is provided by the label is omitted.
141    label = label if label is not None else STATIC_CONFIG['connectors']['default_label']
142
143    ### type might actually be a label. Check if so and raise a warning.
144    if type not in connectors:
145        possibilities, poss_msg = [], ""
146        for _type in get_config('meerschaum', 'connectors'):
147            if type in get_config('meerschaum', 'connectors', _type):
148                possibilities.append(f"{_type}:{type}")
149        if len(possibilities) > 0:
150            poss_msg = " Did you mean"
151            for poss in possibilities[:-1]:
152                poss_msg += f" '{poss}',"
153            if poss_msg.endswith(','):
154                poss_msg = poss_msg[:-1]
155            if len(possibilities) > 1:
156                poss_msg += " or"
157            poss_msg += f" '{possibilities[-1]}'?"
158
159        warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False)
160        return None
161
162    if 'sql' not in types:
163        from meerschaum.connectors.plugin import PluginConnector
164        from meerschaum.connectors.valkey import ValkeyConnector
165        with _locks['types']:
166            types.update({
167                'api': APIConnector,
168                'sql': SQLConnector,
169                'plugin': PluginConnector,
170                'valkey': ValkeyConnector,
171            })
172
173    ### determine if we need to call the constructor
174    if not refresh:
175        ### see if any user-supplied arguments differ from the existing instance
176        if label in connectors[type]:
177            warning_message = None
178            for attribute, value in kw.items():
179                if attribute not in connectors[type][label].meta:
180                    import inspect
181                    cls = connectors[type][label].__class__
182                    cls_init_signature = inspect.signature(cls)
183                    cls_init_params = cls_init_signature.parameters
184                    if attribute not in cls_init_params:
185                        warning_message = (
186                            f"Received new attribute '{attribute}' not present in connector " +
187                            f"{connectors[type][label]}.\n"
188                        )
189                elif connectors[type][label].__dict__[attribute] != value:
190                    warning_message = (
191                        f"Mismatched values for attribute '{attribute}' in connector "
192                        + f"'{connectors[type][label]}'.\n" +
193                        f"  - Keyword value: '{value}'\n" +
194                        f"  - Existing value: '{connectors[type][label].__dict__[attribute]}'\n"
195                    )
196            if warning_message is not None:
197                warning_message += (
198                    "\nSetting `refresh` to True and recreating connector with type:"
199                    + f" '{type}' and label '{label}'."
200                )
201                refresh = True
202                warn(warning_message)
203        else: ### connector doesn't yet exist
204            refresh = True
205
206    ### only create an object if refresh is True
207    ### (can be manually specified, otherwise determined above)
208    if refresh:
209        with _locks['connectors']:
210            try:
211                ### will raise an error if configuration is incorrect / missing
212                conn = types[type](label=label, **kw)
213                connectors[type][label] = conn
214            except InvalidAttributesError as ie:
215                warn(
216                    f"Incorrect attributes for connector '{type}:{label}'.\n"
217                    + str(ie),
218                    stack = False,
219                )
220                conn = None
221            except Exception as e:
222                from meerschaum.utils.formatting import get_console
223                console = get_console()
224                if console:
225                    console.print_exception()
226                warn(
227                    f"Exception when creating connector '{type}:{label}'.\n" + str(e),
228                    stack = False,
229                )
230                conn = None
231        if conn is None:
232            return None
233
234    return connectors[type][label]

Return existing connector or create new connection and store for reuse.

You can create new connectors if enough parameters are provided for the given type and flavor.

Parameters
  • type (Optional[str], default None): Connector type (sql, api, etc.). Defaults to the type of the configured instance_connector.
  • label (Optional[str], default None): Connector label (e.g. main). Defaults to 'main'.
  • refresh (bool, default False): Refresh the Connector instance / construct new object. Defaults to False.
  • kw (Any): Other arguments to pass to the Connector constructor. If the Connector has already been constructed and new arguments are provided, refresh is set to True and the old Connector is replaced.
Returns
Examples

The following parameters would create a new meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.

>>> conn = get_connector(
...     type = 'sql',
...     label = 'newlabel',
...     flavor = 'sqlite',
...     database = '/file/path/to/database.db'
... )
>>>
def get_config( *keys: str, patch: bool = True, substitute: bool = True, sync_files: bool = True, write_missing: bool = True, as_tuple: bool = False, warn: bool = True, debug: bool = False) -> Any:
115def get_config(
116    *keys: str,
117    patch: bool = True,
118    substitute: bool = True,
119    sync_files: bool = True,
120    write_missing: bool = True,
121    as_tuple: bool = False,
122    warn: bool = True,
123    debug: bool = False
124) -> Any:
125    """
126    Return the Meerschaum configuration dictionary.
127    If positional arguments are provided, index by the keys.
128    Raises a warning if invalid keys are provided.
129
130    Parameters
131    ----------
132    keys: str:
133        List of strings to index.
134
135    patch: bool, default True
136        If `True`, patch missing default keys into the config directory.
137        Defaults to `True`.
138
139    sync_files: bool, default True
140        If `True`, sync files if needed.
141        Defaults to `True`.
142
143    write_missing: bool, default True
144        If `True`, write default values when the main config files are missing.
145        Defaults to `True`.
146
147    substitute: bool, default True
148        If `True`, subsitute 'MRSM{}' values.
149        Defaults to `True`.
150
151    as_tuple: bool, default False
152        If `True`, return a tuple of type (success, value).
153        Defaults to `False`.
154        
155    Returns
156    -------
157    The value in the configuration directory, indexed by the provided keys.
158
159    Examples
160    --------
161    >>> get_config('meerschaum', 'instance')
162    'sql:main'
163    >>> get_config('does', 'not', 'exist')
164    UserWarning: Invalid keys in config: ('does', 'not', 'exist')
165    """
166    import json
167
168    symlinks_key = STATIC_CONFIG['config']['symlinks_key']
169    if debug:
170        from meerschaum.utils.debug import dprint
171        dprint(f"Indexing keys: {keys}", color=False)
172
173    if len(keys) == 0:
174        _rc = _config(
175            substitute=substitute,
176            sync_files=sync_files,
177            write_missing=(write_missing and _allow_write_missing),
178        )
179        if as_tuple:
180            return True, _rc 
181        return _rc
182    
183    ### Weird threading issues, only import if substitute is True.
184    if substitute:
185        from meerschaum.config._read_config import search_and_substitute_config
186    ### Invalidate the cache if it was read before with substitute=False
187    ### but there still exist substitutions.
188    if (
189        config is not None and substitute and keys[0] != symlinks_key
190        and 'MRSM{' in json.dumps(config.get(keys[0]))
191    ):
192        try:
193            _subbed = search_and_substitute_config({keys[0]: config[keys[0]]})
194        except Exception:
195            import traceback
196            traceback.print_exc()
197            _subbed = {keys[0]: config[keys[0]]}
198
199        config[keys[0]] = _subbed[keys[0]]
200        if symlinks_key in _subbed:
201            if symlinks_key not in config:
202                config[symlinks_key] = {}
203            config[symlinks_key] = apply_patch_to_config(
204                _subbed.get(symlinks_key, {}),
205                config.get(symlinks_key, {}),
206            )
207
208    from meerschaum.config._sync import sync_files as _sync_files
209    if config is None:
210        _config(*keys, sync_files=sync_files)
211
212    invalid_keys = False
213    if keys[0] not in config and keys[0] != symlinks_key:
214        single_key_config = read_config(
215            keys=[keys[0]], substitute=substitute, write_missing=write_missing
216        )
217        if keys[0] not in single_key_config:
218            invalid_keys = True
219        else:
220            config[keys[0]] = single_key_config.get(keys[0], None)
221            if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]:
222                if symlinks_key not in config:
223                    config[symlinks_key] = {}
224                config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]]
225
226            if sync_files:
227                _sync_files(keys=[keys[0]])
228
229    c = config
230    if len(keys) > 0:
231        for k in keys:
232            try:
233                c = c[k]
234            except Exception:
235                invalid_keys = True
236                break
237        if invalid_keys:
238            ### Check if the keys are in the default configuration.
239            from meerschaum.config._default import default_config
240            in_default = True
241            patched_default_config = (
242                search_and_substitute_config(default_config)
243                if substitute else copy.deepcopy(default_config)
244            )
245            _c = patched_default_config
246            for k in keys:
247                try:
248                    _c = _c[k]
249                except Exception:
250                    in_default = False
251            if in_default:
252                c = _c
253                invalid_keys = False
254            warning_msg = f"Invalid keys in config: {keys}"
255            if not in_default:
256                try:
257                    if warn:
258                        from meerschaum.utils.warnings import warn as _warn
259                        _warn(warning_msg, stacklevel=3, color=False)
260                except Exception:
261                    if warn:
262                        print(warning_msg)
263                if as_tuple:
264                    return False, None
265                return None
266
267            ### Don't write keys that we haven't yet loaded into memory.
268            not_loaded_keys = [k for k in patched_default_config if k not in config]
269            for k in not_loaded_keys:
270                patched_default_config.pop(k, None)
271
272            set_config(
273                apply_patch_to_config(
274                    patched_default_config,
275                    config,
276                )
277            )
278            if patch and keys[0] != symlinks_key:
279                if write_missing:
280                    write_config(config, debug=debug)
281
282    if as_tuple:
283        return (not invalid_keys), c
284    return c

Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.

Parameters
  • keys (str:): List of strings to index.
  • patch (bool, default True): If True, patch missing default keys into the config directory. Defaults to True.
  • sync_files (bool, default True): If True, sync files if needed. Defaults to True.
  • write_missing (bool, default True): If True, write default values when the main config files are missing. Defaults to True.
  • substitute (bool, default True): If True, subsitute 'MRSM{}' values. Defaults to True.
  • as_tuple (bool, default False): If True, return a tuple of type (success, value). Defaults to False.
Returns
  • The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
class Pipe:
 64class Pipe:
 65    """
 66    Access Meerschaum pipes via Pipe objects.
 67    
 68    Pipes are identified by the following:
 69
 70    1. Connector keys (e.g. `'sql:main'`)
 71    2. Metric key (e.g. `'weather'`)
 72    3. Location (optional; e.g. `None`)
 73    
 74    A pipe's connector keys correspond to a data source, and when the pipe is synced,
 75    its `fetch` definition is evaluated and executed to produce new data.
 76    
 77    Alternatively, new data may be directly synced via `pipe.sync()`:
 78    
 79    ```
 80    >>> from meerschaum import Pipe
 81    >>> pipe = Pipe('csv', 'weather')
 82    >>>
 83    >>> import pandas as pd
 84    >>> df = pd.read_csv('weather.csv')
 85    >>> pipe.sync(df)
 86    ```
 87    """
 88
 89    from ._fetch import (
 90        fetch,
 91        get_backtrack_interval,
 92    )
 93    from ._data import (
 94        get_data,
 95        get_backtrack_data,
 96        get_rowcount,
 97        get_data,
 98        get_doc,
 99        get_value,
100        _get_data_as_iterator,
101        get_chunk_interval,
102        get_chunk_bounds,
103        get_chunk_bounds_batches,
104        parse_date_bounds,
105    )
106    from ._register import register
107    from ._attributes import (
108        attributes,
109        parameters,
110        columns,
111        indices,
112        indexes,
113        dtypes,
114        autoincrement,
115        autotime,
116        upsert,
117        static,
118        tzinfo,
119        enforce,
120        null_indices,
121        mixed_numerics,
122        get_columns,
123        get_columns_types,
124        get_columns_indices,
125        get_indices,
126        get_parameters,
127        get_dtypes,
128        update_parameters,
129        tags,
130        get_id,
131        id,
132        get_val_column,
133        parents,
134        parent,
135        children,
136        target,
137        _target_legacy,
138        guess_datetime,
139        precision,
140        get_precision,
141    )
142    from ._cache import (
143        _get_cache_connector,
144        _cache_value,
145        _get_cached_value,
146        _invalidate_cache,
147        _get_cache_dir_path,
148        _write_cache_key,
149        _write_cache_file,
150        _write_cache_conn_key,
151        _read_cache_key,
152        _read_cache_file,
153        _read_cache_conn_key,
154        _load_cache_keys,
155        _load_cache_files,
156        _load_cache_conn_keys,
157        _get_cache_keys,
158        _get_cache_file_keys,
159        _get_cache_conn_keys,
160        _clear_cache_key,
161        _clear_cache_file,
162        _clear_cache_conn_key,
163    )
164    from ._show import show
165    from ._edit import edit, edit_definition, update
166    from ._sync import (
167        sync,
168        get_sync_time,
169        exists,
170        filter_existing,
171        _get_chunk_label,
172        get_num_workers,
173        _persist_new_special_columns,
174    )
175    from ._verify import (
176        verify,
177        get_bound_interval,
178        get_bound_time,
179    )
180    from ._delete import delete
181    from ._drop import drop, drop_indices
182    from ._index import create_indices
183    from ._clear import clear
184    from ._deduplicate import deduplicate
185    from ._bootstrap import bootstrap
186    from ._dtypes import enforce_dtypes, infer_dtypes
187    from ._copy import copy_to
188
189    def __init__(
190        self,
191        connector: str = '',
192        metric: str = '',
193        location: Optional[str] = None,
194        parameters: Optional[Dict[str, Any]] = None,
195        columns: Union[Dict[str, str], List[str], None] = None,
196        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
197        tags: Optional[List[str]] = None,
198        target: Optional[str] = None,
199        dtypes: Optional[Dict[str, str]] = None,
200        instance: Optional[Union[str, InstanceConnector]] = None,
201        upsert: Optional[bool] = None,
202        autoincrement: Optional[bool] = None,
203        autotime: Optional[bool] = None,
204        precision: Union[str, Dict[str, Union[str, int]], None] = None,
205        static: Optional[bool] = None,
206        enforce: Optional[bool] = None,
207        null_indices: Optional[bool] = None,
208        mixed_numerics: Optional[bool] = None,
209        temporary: bool = False,
210        cache: Optional[bool] = None,
211        cache_connector_keys: Optional[str] = None,
212        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
213        connector_keys: Optional[str] = None,
214        metric_key: Optional[str] = None,
215        location_key: Optional[str] = None,
216        instance_keys: Optional[str] = None,
217        indexes: Union[Dict[str, str], List[str], None] = None,
218        debug: bool = False,
219    ):
220        """
221        Parameters
222        ----------
223        connector: str
224            Keys for the pipe's source connector, e.g. `'sql:main'`.
225
226        metric: str
227            Label for the pipe's contents, e.g. `'weather'`.
228
229        location: str, default None
230            Label for the pipe's location. Defaults to `None`.
231
232        parameters: Optional[Dict[str, Any]], default None
233            Optionally set a pipe's parameters from the constructor,
234            e.g. columns and other attributes.
235            You can edit these parameters with `edit pipes`.
236
237        columns: Union[Dict[str, str], List[str], None], default None
238            Set the `columns` dictionary of `parameters`.
239            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
240
241        indices: Optional[Dict[str, Union[str, List[str]]]], default None
242            Set the `indices` dictionary of `parameters`.
243            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
244
245        tags: Optional[List[str]], default None
246            A list of strings to be added under the `'tags'` key of `parameters`.
247            You can select pipes with certain tags using `--tags`.
248
249        dtypes: Optional[Dict[str, str]], default None
250            Set the `dtypes` dictionary of `parameters`.
251            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
252
253        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
254            Connector for the Meerschaum instance where the pipe resides.
255            Defaults to the preconfigured default instance (`'sql:main'`).
256
257        instance: Optional[Union[str, InstanceConnector]], default None
258            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
259
260        upsert: Optional[bool], default None
261            If `True`, set `upsert` to `True` in the parameters.
262
263        autoincrement: Optional[bool], default None
264            If `True`, set `autoincrement` in the parameters.
265
266        autotime: Optional[bool], default None
267            If `True`, set `autotime` in the parameters.
268
269        precision: Union[str, Dict[str, Union[str, int]], None], default None
270            If provided, set `precision` in the parameters.
271            This may be either a string (the precision unit) or a dictionary of in the form
272            `{'unit': <unit>, 'interval': <interval>}`.
273            Default is determined by the `datetime` column dtype
274            (e.g. `datetime64[us]` is `microsecond` precision).
275
276        static: Optional[bool], default None
277            If `True`, set `static` in the parameters.
278
279        enforce: Optional[bool], default None
280            If `False`, skip data type enforcement.
281            Default behavior is `True`.
282
283        null_indices: Optional[bool], default None
284            Set to `False` if there will be no null values in the index columns.
285            Defaults to `True`.
286
287        mixed_numerics: bool, default None
288            If `True`, integer columns will be converted to `numeric` when floats are synced.
289            Set to `False` to disable this behavior.
290            Defaults to `True`.
291
292        temporary: bool, default False
293            If `True`, prevent instance tables (pipes, users, plugins) from being created.
294
295        cache: Optional[bool], default None
296            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
297            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
298            Defaults to `True` (from `None`).
299
300        cache_connector_keys: Optional[str], default None
301            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
302        """
303        from meerschaum.utils.warnings import error, warn
304        if (not connector and not connector_keys) or (not metric and not metric_key):
305            error(
306                "Please provide strings for the connector and metric\n    "
307                + "(first two positional arguments)."
308            )
309
310        ### Fall back to legacy `location_key` just in case.
311        if not location:
312            location = location_key
313
314        if not connector:
315            connector = connector_keys
316
317        if not metric:
318            metric = metric_key
319
320        if location in ('[None]', 'None'):
321            location = None
322
323        from meerschaum._internal.static import STATIC_CONFIG
324        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
325        for k in (connector, metric, location, *(tags or [])):
326            if str(k).startswith(negation_prefix):
327                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
328
329        self.connector_keys = str(connector)
330        self.connector_key = self.connector_keys ### Alias
331        self.metric_key = metric
332        self.location_key = location
333        self.temporary = temporary
334        self.cache = cache if cache is not None else (not temporary)
335        self.cache_connector_keys = (
336            str(cache_connector_keys)
337            if cache_connector_keys is not None
338            else None
339        )
340        self.debug = debug
341
342        self._attributes: Dict[str, Any] = {
343            'connector_keys': self.connector_keys,
344            'metric_key': self.metric_key,
345            'location_key': self.location_key,
346            'parameters': {},
347        }
348
349        ### only set parameters if values are provided
350        if isinstance(parameters, dict):
351            self._attributes['parameters'] = parameters
352        else:
353            if parameters is not None:
354                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
355            self._attributes['parameters'] = {}
356
357        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
358        if isinstance(columns, (list, tuple)):
359            columns = {str(col): str(col) for col in columns}
360        if isinstance(columns, dict):
361            self._attributes['parameters']['columns'] = columns
362        elif isinstance(columns, str) and 'Pipe(' in columns:
363            pass
364        elif columns is not None:
365            warn(f"The provided columns are of invalid type '{type(columns)}'.")
366
367        indices = (
368            indices
369            or indexes
370            or self._attributes.get('parameters', {}).get('indices', None)
371            or self._attributes.get('parameters', {}).get('indexes', None)
372        )
373        if isinstance(indices, dict):
374            indices_key = (
375                'indexes'
376                if 'indexes' in self._attributes['parameters']
377                else 'indices'
378            )
379            self._attributes['parameters'][indices_key] = indices
380
381        if isinstance(tags, (list, tuple)):
382            self._attributes['parameters']['tags'] = tags
383        elif tags is not None:
384            warn(f"The provided tags are of invalid type '{type(tags)}'.")
385
386        if isinstance(target, str):
387            self._attributes['parameters']['target'] = target
388        elif target is not None:
389            warn(f"The provided target is of invalid type '{type(target)}'.")
390
391        if isinstance(dtypes, dict):
392            self._attributes['parameters']['dtypes'] = dtypes
393        elif dtypes is not None:
394            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
395
396        if isinstance(upsert, bool):
397            self._attributes['parameters']['upsert'] = upsert
398
399        if isinstance(autoincrement, bool):
400            self._attributes['parameters']['autoincrement'] = autoincrement
401
402        if isinstance(autotime, bool):
403            self._attributes['parameters']['autotime'] = autotime
404
405        if isinstance(precision, dict):
406            self._attributes['parameters']['precision'] = precision
407        elif isinstance(precision, str):
408            self._attributes['parameters']['precision'] = {'unit': precision}
409
410        if isinstance(static, bool):
411            self._attributes['parameters']['static'] = static
412            self._static = static
413
414        if isinstance(enforce, bool):
415            self._attributes['parameters']['enforce'] = enforce
416
417        if isinstance(null_indices, bool):
418            self._attributes['parameters']['null_indices'] = null_indices
419
420        if isinstance(mixed_numerics, bool):
421            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
422
423        ### NOTE: The parameters dictionary is {} by default.
424        ###       A Pipe may be registered without parameters, then edited,
425        ###       or a Pipe may be registered with parameters set in-memory first.
426        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
427        if _mrsm_instance is None:
428            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
429
430        if not isinstance(_mrsm_instance, str):
431            self._instance_connector = _mrsm_instance
432            self.instance_keys = str(_mrsm_instance)
433        else:
434            self.instance_keys = _mrsm_instance
435
436        if self.instance_keys == 'sql:memory':
437            self.cache = False
438
439
440    @property
441    def meta(self):
442        """
443        Return the four keys needed to reconstruct this pipe.
444        """
445        return {
446            'connector_keys': self.connector_keys,
447            'metric_key': self.metric_key,
448            'location_key': self.location_key,
449            'instance_keys': self.instance_keys,
450        }
451
452    def keys(self) -> List[str]:
453        """
454        Return the ordered keys for this pipe.
455        """
456        return {
457            key: val
458            for key, val in self.meta.items()
459            if key != 'instance'
460        }
461
462    @property
463    def instance_connector(self) -> Union[InstanceConnector, None]:
464        """
465        The instance connector on which this pipe resides.
466        """
467        if '_instance_connector' not in self.__dict__:
468            from meerschaum.connectors.parse import parse_instance_keys
469            conn = parse_instance_keys(self.instance_keys)
470            if conn:
471                self._instance_connector = conn
472            else:
473                return None
474        return self._instance_connector
475
476    @property
477    def connector(self) -> Union['Connector', None]:
478        """
479        The connector to the data source.
480        """
481        if '_connector' not in self.__dict__:
482            from meerschaum.connectors.parse import parse_instance_keys
483            import warnings
484            with warnings.catch_warnings():
485                warnings.simplefilter('ignore')
486                try:
487                    conn = parse_instance_keys(self.connector_keys)
488                except Exception:
489                    conn = None
490            if conn:
491                self._connector = conn
492            else:
493                return None
494        return self._connector
495
496    def __str__(self, ansi: bool=False):
497        return pipe_repr(self, ansi=ansi)
498
499    def __eq__(self, other):
500        try:
501            return (
502                isinstance(self, type(other))
503                and self.connector_keys == other.connector_keys
504                and self.metric_key == other.metric_key
505                and self.location_key == other.location_key
506                and self.instance_keys == other.instance_keys
507            )
508        except Exception:
509            return False
510
511    def __hash__(self):
512        ### Using an esoteric separator to avoid collisions.
513        sep = "[\"']"
514        return hash(
515            str(self.connector_keys) + sep
516            + str(self.metric_key) + sep
517            + str(self.location_key) + sep
518            + str(self.instance_keys) + sep
519        )
520
521    def __repr__(self, ansi: bool=True, **kw) -> str:
522        if not hasattr(sys, 'ps1'):
523            ansi = False
524
525        return pipe_repr(self, ansi=ansi, **kw)
526
527    def __pt_repr__(self):
528        from meerschaum.utils.packages import attempt_import
529        prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False)
530        return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True))
531
532    def __getstate__(self) -> Dict[str, Any]:
533        """
534        Define the state dictionary (pickling).
535        """
536        return {
537            'connector_keys': self.connector_keys,
538            'metric_key': self.metric_key,
539            'location_key': self.location_key,
540            'parameters': self._attributes.get('parameters', None),
541            'instance_keys': self.instance_keys,
542        }
543
544    def __setstate__(self, _state: Dict[str, Any]):
545        """
546        Read the state (unpickling).
547        """
548        self.__init__(**_state)
549
550    def __getitem__(self, key: str) -> Any:
551        """
552        Index the pipe's attributes.
553        If the `key` cannot be found`, return `None`.
554        """
555        if key in self.attributes:
556            return self.attributes.get(key, None)
557
558        aliases = {
559            'connector': 'connector_keys',
560            'connector_key': 'connector_keys',
561            'metric': 'metric_key',
562            'location': 'location_key',
563        }
564        aliased_key = aliases.get(key, None)
565        if aliased_key is not None:
566            return self.attributes.get(aliased_key, None)
567
568        property_aliases = {
569            'instance': 'instance_keys',
570            'instance_key': 'instance_keys',
571        }
572        aliased_key = property_aliases.get(key, None)
573        if aliased_key is not None:
574            key = aliased_key
575        return getattr(self, key, None)
576
577    def __copy__(self):
578        """
579        Return a shallow copy of the current pipe.
580        """
581        return mrsm.Pipe(
582            self.connector_keys, self.metric_key, self.location_key,
583            instance=self.instance_keys,
584            parameters=self._attributes.get('parameters', None),
585        )
586
587    def __deepcopy__(self, memo):
588        """
589        Return a deep copy of the current pipe.
590        """
591        return self.__copy__()

Access Meerschaum pipes via Pipe objects.

Pipes are identified by the following:

  1. Connector keys (e.g. 'sql:main')
  2. Metric key (e.g. 'weather')
  3. Location (optional; e.g. None)

A pipe's connector keys correspond to a data source, and when the pipe is synced, its fetch definition is evaluated and executed to produce new data.

Alternatively, new data may be directly synced via pipe.sync():

>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
Pipe( connector: str = '', metric: str = '', location: Optional[str] = None, parameters: Optional[Dict[str, Any]] = None, columns: Union[Dict[str, str], List[str], NoneType] = None, indices: Optional[Dict[str, Union[str, List[str]]]] = None, tags: Optional[List[str]] = None, target: Optional[str] = None, dtypes: Optional[Dict[str, str]] = None, instance: Union[str, InstanceConnector, NoneType] = None, upsert: Optional[bool] = None, autoincrement: Optional[bool] = None, autotime: Optional[bool] = None, precision: Union[str, Dict[str, Union[str, int]], NoneType] = None, static: Optional[bool] = None, enforce: Optional[bool] = None, null_indices: Optional[bool] = None, mixed_numerics: Optional[bool] = None, temporary: bool = False, cache: Optional[bool] = None, cache_connector_keys: Optional[str] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, connector_keys: Optional[str] = None, metric_key: Optional[str] = None, location_key: Optional[str] = None, instance_keys: Optional[str] = None, indexes: Union[Dict[str, str], List[str], NoneType] = None, debug: bool = False)
189    def __init__(
190        self,
191        connector: str = '',
192        metric: str = '',
193        location: Optional[str] = None,
194        parameters: Optional[Dict[str, Any]] = None,
195        columns: Union[Dict[str, str], List[str], None] = None,
196        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
197        tags: Optional[List[str]] = None,
198        target: Optional[str] = None,
199        dtypes: Optional[Dict[str, str]] = None,
200        instance: Optional[Union[str, InstanceConnector]] = None,
201        upsert: Optional[bool] = None,
202        autoincrement: Optional[bool] = None,
203        autotime: Optional[bool] = None,
204        precision: Union[str, Dict[str, Union[str, int]], None] = None,
205        static: Optional[bool] = None,
206        enforce: Optional[bool] = None,
207        null_indices: Optional[bool] = None,
208        mixed_numerics: Optional[bool] = None,
209        temporary: bool = False,
210        cache: Optional[bool] = None,
211        cache_connector_keys: Optional[str] = None,
212        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
213        connector_keys: Optional[str] = None,
214        metric_key: Optional[str] = None,
215        location_key: Optional[str] = None,
216        instance_keys: Optional[str] = None,
217        indexes: Union[Dict[str, str], List[str], None] = None,
218        debug: bool = False,
219    ):
220        """
221        Parameters
222        ----------
223        connector: str
224            Keys for the pipe's source connector, e.g. `'sql:main'`.
225
226        metric: str
227            Label for the pipe's contents, e.g. `'weather'`.
228
229        location: str, default None
230            Label for the pipe's location. Defaults to `None`.
231
232        parameters: Optional[Dict[str, Any]], default None
233            Optionally set a pipe's parameters from the constructor,
234            e.g. columns and other attributes.
235            You can edit these parameters with `edit pipes`.
236
237        columns: Union[Dict[str, str], List[str], None], default None
238            Set the `columns` dictionary of `parameters`.
239            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
240
241        indices: Optional[Dict[str, Union[str, List[str]]]], default None
242            Set the `indices` dictionary of `parameters`.
243            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
244
245        tags: Optional[List[str]], default None
246            A list of strings to be added under the `'tags'` key of `parameters`.
247            You can select pipes with certain tags using `--tags`.
248
249        dtypes: Optional[Dict[str, str]], default None
250            Set the `dtypes` dictionary of `parameters`.
251            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
252
253        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
254            Connector for the Meerschaum instance where the pipe resides.
255            Defaults to the preconfigured default instance (`'sql:main'`).
256
257        instance: Optional[Union[str, InstanceConnector]], default None
258            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
259
260        upsert: Optional[bool], default None
261            If `True`, set `upsert` to `True` in the parameters.
262
263        autoincrement: Optional[bool], default None
264            If `True`, set `autoincrement` in the parameters.
265
266        autotime: Optional[bool], default None
267            If `True`, set `autotime` in the parameters.
268
269        precision: Union[str, Dict[str, Union[str, int]], None], default None
270            If provided, set `precision` in the parameters.
271            This may be either a string (the precision unit) or a dictionary of in the form
272            `{'unit': <unit>, 'interval': <interval>}`.
273            Default is determined by the `datetime` column dtype
274            (e.g. `datetime64[us]` is `microsecond` precision).
275
276        static: Optional[bool], default None
277            If `True`, set `static` in the parameters.
278
279        enforce: Optional[bool], default None
280            If `False`, skip data type enforcement.
281            Default behavior is `True`.
282
283        null_indices: Optional[bool], default None
284            Set to `False` if there will be no null values in the index columns.
285            Defaults to `True`.
286
287        mixed_numerics: bool, default None
288            If `True`, integer columns will be converted to `numeric` when floats are synced.
289            Set to `False` to disable this behavior.
290            Defaults to `True`.
291
292        temporary: bool, default False
293            If `True`, prevent instance tables (pipes, users, plugins) from being created.
294
295        cache: Optional[bool], default None
296            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
297            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
298            Defaults to `True` (from `None`).
299
300        cache_connector_keys: Optional[str], default None
301            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
302        """
303        from meerschaum.utils.warnings import error, warn
304        if (not connector and not connector_keys) or (not metric and not metric_key):
305            error(
306                "Please provide strings for the connector and metric\n    "
307                + "(first two positional arguments)."
308            )
309
310        ### Fall back to legacy `location_key` just in case.
311        if not location:
312            location = location_key
313
314        if not connector:
315            connector = connector_keys
316
317        if not metric:
318            metric = metric_key
319
320        if location in ('[None]', 'None'):
321            location = None
322
323        from meerschaum._internal.static import STATIC_CONFIG
324        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
325        for k in (connector, metric, location, *(tags or [])):
326            if str(k).startswith(negation_prefix):
327                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
328
329        self.connector_keys = str(connector)
330        self.connector_key = self.connector_keys ### Alias
331        self.metric_key = metric
332        self.location_key = location
333        self.temporary = temporary
334        self.cache = cache if cache is not None else (not temporary)
335        self.cache_connector_keys = (
336            str(cache_connector_keys)
337            if cache_connector_keys is not None
338            else None
339        )
340        self.debug = debug
341
342        self._attributes: Dict[str, Any] = {
343            'connector_keys': self.connector_keys,
344            'metric_key': self.metric_key,
345            'location_key': self.location_key,
346            'parameters': {},
347        }
348
349        ### only set parameters if values are provided
350        if isinstance(parameters, dict):
351            self._attributes['parameters'] = parameters
352        else:
353            if parameters is not None:
354                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
355            self._attributes['parameters'] = {}
356
357        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
358        if isinstance(columns, (list, tuple)):
359            columns = {str(col): str(col) for col in columns}
360        if isinstance(columns, dict):
361            self._attributes['parameters']['columns'] = columns
362        elif isinstance(columns, str) and 'Pipe(' in columns:
363            pass
364        elif columns is not None:
365            warn(f"The provided columns are of invalid type '{type(columns)}'.")
366
367        indices = (
368            indices
369            or indexes
370            or self._attributes.get('parameters', {}).get('indices', None)
371            or self._attributes.get('parameters', {}).get('indexes', None)
372        )
373        if isinstance(indices, dict):
374            indices_key = (
375                'indexes'
376                if 'indexes' in self._attributes['parameters']
377                else 'indices'
378            )
379            self._attributes['parameters'][indices_key] = indices
380
381        if isinstance(tags, (list, tuple)):
382            self._attributes['parameters']['tags'] = tags
383        elif tags is not None:
384            warn(f"The provided tags are of invalid type '{type(tags)}'.")
385
386        if isinstance(target, str):
387            self._attributes['parameters']['target'] = target
388        elif target is not None:
389            warn(f"The provided target is of invalid type '{type(target)}'.")
390
391        if isinstance(dtypes, dict):
392            self._attributes['parameters']['dtypes'] = dtypes
393        elif dtypes is not None:
394            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
395
396        if isinstance(upsert, bool):
397            self._attributes['parameters']['upsert'] = upsert
398
399        if isinstance(autoincrement, bool):
400            self._attributes['parameters']['autoincrement'] = autoincrement
401
402        if isinstance(autotime, bool):
403            self._attributes['parameters']['autotime'] = autotime
404
405        if isinstance(precision, dict):
406            self._attributes['parameters']['precision'] = precision
407        elif isinstance(precision, str):
408            self._attributes['parameters']['precision'] = {'unit': precision}
409
410        if isinstance(static, bool):
411            self._attributes['parameters']['static'] = static
412            self._static = static
413
414        if isinstance(enforce, bool):
415            self._attributes['parameters']['enforce'] = enforce
416
417        if isinstance(null_indices, bool):
418            self._attributes['parameters']['null_indices'] = null_indices
419
420        if isinstance(mixed_numerics, bool):
421            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
422
423        ### NOTE: The parameters dictionary is {} by default.
424        ###       A Pipe may be registered without parameters, then edited,
425        ###       or a Pipe may be registered with parameters set in-memory first.
426        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
427        if _mrsm_instance is None:
428            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
429
430        if not isinstance(_mrsm_instance, str):
431            self._instance_connector = _mrsm_instance
432            self.instance_keys = str(_mrsm_instance)
433        else:
434            self.instance_keys = _mrsm_instance
435
436        if self.instance_keys == 'sql:memory':
437            self.cache = False
Parameters
  • connector (str): Keys for the pipe's source connector, e.g. 'sql:main'.
  • metric (str): Label for the pipe's contents, e.g. 'weather'.
  • location (str, default None): Label for the pipe's location. Defaults to None.
  • parameters (Optional[Dict[str, Any]], default None): Optionally set a pipe's parameters from the constructor, e.g. columns and other attributes. You can edit these parameters with edit pipes.
  • columns (Union[Dict[str, str], List[str], None], default None): Set the columns dictionary of parameters. If parameters is also provided, this dictionary is added under the 'columns' key.
  • indices (Optional[Dict[str, Union[str, List[str]]]], default None): Set the indices dictionary of parameters. If parameters is also provided, this dictionary is added under the 'indices' key.
  • tags (Optional[List[str]], default None): A list of strings to be added under the 'tags' key of parameters. You can select pipes with certain tags using --tags.
  • dtypes (Optional[Dict[str, str]], default None): Set the dtypes dictionary of parameters. If parameters is also provided, this dictionary is added under the 'dtypes' key.
  • mrsm_instance (Optional[Union[str, InstanceConnector]], default None): Connector for the Meerschaum instance where the pipe resides. Defaults to the preconfigured default instance ('sql:main').
  • instance (Optional[Union[str, InstanceConnector]], default None): Alias for mrsm_instance. If mrsm_instance is supplied, this value is ignored.
  • upsert (Optional[bool], default None): If True, set upsert to True in the parameters.
  • autoincrement (Optional[bool], default None): If True, set autoincrement in the parameters.
  • autotime (Optional[bool], default None): If True, set autotime in the parameters.
  • precision (Union[str, Dict[str, Union[str, int]], None], default None): If provided, set precision in the parameters. This may be either a string (the precision unit) or a dictionary of in the form {'unit': <unit>, 'interval': <interval>}. Default is determined by the datetime column dtype (e.g. datetime64[us] is microsecond precision).
  • static (Optional[bool], default None): If True, set static in the parameters.
  • enforce (Optional[bool], default None): If False, skip data type enforcement. Default behavior is True.
  • null_indices (Optional[bool], default None): Set to False if there will be no null values in the index columns. Defaults to True.
  • mixed_numerics (bool, default None): If True, integer columns will be converted to numeric when floats are synced. Set to False to disable this behavior. Defaults to True.
  • temporary (bool, default False): If True, prevent instance tables (pipes, users, plugins) from being created.
  • cache (Optional[bool], default None): If True, cache the pipe's metadata to disk (in addition to in-memory caching). If cache is not explicitly True, it is set to False if temporary is True. Defaults to True (from None).
  • cache_connector_keys (Optional[str], default None): If provided, use the keys to a Valkey connector (e.g. valkey:main).
connector_keys
connector_key
metric_key
location_key
temporary
cache
cache_connector_keys
debug
meta
440    @property
441    def meta(self):
442        """
443        Return the four keys needed to reconstruct this pipe.
444        """
445        return {
446            'connector_keys': self.connector_keys,
447            'metric_key': self.metric_key,
448            'location_key': self.location_key,
449            'instance_keys': self.instance_keys,
450        }

Return the four keys needed to reconstruct this pipe.

def keys(self) -> List[str]:
452    def keys(self) -> List[str]:
453        """
454        Return the ordered keys for this pipe.
455        """
456        return {
457            key: val
458            for key, val in self.meta.items()
459            if key != 'instance'
460        }

Return the ordered keys for this pipe.

instance_connector: Optional[InstanceConnector]
462    @property
463    def instance_connector(self) -> Union[InstanceConnector, None]:
464        """
465        The instance connector on which this pipe resides.
466        """
467        if '_instance_connector' not in self.__dict__:
468            from meerschaum.connectors.parse import parse_instance_keys
469            conn = parse_instance_keys(self.instance_keys)
470            if conn:
471                self._instance_connector = conn
472            else:
473                return None
474        return self._instance_connector

The instance connector on which this pipe resides.

connector: "Union['Connector', None]"
476    @property
477    def connector(self) -> Union['Connector', None]:
478        """
479        The connector to the data source.
480        """
481        if '_connector' not in self.__dict__:
482            from meerschaum.connectors.parse import parse_instance_keys
483            import warnings
484            with warnings.catch_warnings():
485                warnings.simplefilter('ignore')
486                try:
487                    conn = parse_instance_keys(self.connector_keys)
488                except Exception:
489                    conn = None
490            if conn:
491                self._connector = conn
492            else:
493                return None
494        return self._connector

The connector to the data source.

def fetch( self, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, check_existing: bool = True, sync_chunks: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame], NoneType]:
21def fetch(
22    self,
23    begin: Union[datetime, int, str, None] = '',
24    end: Union[datetime, int, None] = None,
25    check_existing: bool = True,
26    sync_chunks: bool = False,
27    debug: bool = False,
28    **kw: Any
29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
30    """
31    Fetch a Pipe's latest data from its connector.
32
33    Parameters
34    ----------
35    begin: Union[datetime, str, None], default '':
36        If provided, only fetch data newer than or equal to `begin`.
37
38    end: Optional[datetime], default None:
39        If provided, only fetch data older than or equal to `end`.
40
41    check_existing: bool, default True
42        If `False`, do not apply the backtrack interval.
43
44    sync_chunks: bool, default False
45        If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching
46        loads chunks into memory.
47
48    debug: bool, default False
49        Verbosity toggle.
50
51    Returns
52    -------
53    A `pd.DataFrame` of the newest unseen data.
54
55    """
56    if 'fetch' not in dir(self.connector):
57        warn(f"No `fetch()` function defined for connector '{self.connector}'")
58        return None
59
60    from meerschaum.connectors import get_connector_plugin
61    from meerschaum.utils.misc import filter_arguments
62
63    _chunk_hook = kw.pop('chunk_hook', None)
64    kw['workers'] = self.get_num_workers(kw.get('workers', None))
65    if sync_chunks and _chunk_hook is None:
66
67        def _chunk_hook(chunk, **_kw) -> SuccessTuple:
68            """
69            Wrap `Pipe.sync()` with a custom chunk label prepended to the message.
70            """
71            from meerschaum.config._patch import apply_patch_to_config
72            kwargs = apply_patch_to_config(kw, _kw)
73            chunk_success, chunk_message = self.sync(chunk, **kwargs)
74            chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None))
75            if chunk_label:
76                chunk_message = '\n' + chunk_label + '\n' + chunk_message
77            return chunk_success, chunk_message
78
79    begin, end = self.parse_date_bounds(begin, end)
80
81    with mrsm.Venv(get_connector_plugin(self.connector)):
82        _args, _kwargs = filter_arguments(
83            self.connector.fetch,
84            self,
85            begin=_determine_begin(
86                self,
87                begin,
88                end,
89                check_existing=check_existing,
90                debug=debug,
91            ),
92            end=end,
93            chunk_hook=_chunk_hook,
94            debug=debug,
95            **kw
96        )
97        df = self.connector.fetch(*_args, **_kwargs)
98    return df

Fetch a Pipe's latest data from its connector.

Parameters
  • begin (Union[datetime, str, None], default '':): If provided, only fetch data newer than or equal to begin.
  • end (Optional[datetime], default None:): If provided, only fetch data older than or equal to end.
  • check_existing (bool, default True): If False, do not apply the backtrack interval.
  • sync_chunks (bool, default False): If True and the pipe's connector is of type 'sql', begin syncing chunks while fetching loads chunks into memory.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pd.DataFrame of the newest unseen data.
def get_backtrack_interval( self, check_existing: bool = True, debug: bool = False) -> Union[datetime.timedelta, int]:
101def get_backtrack_interval(
102    self,
103    check_existing: bool = True,
104    debug: bool = False,
105) -> Union[timedelta, int]:
106    """
107    Get the chunk interval to use for this pipe.
108
109    Parameters
110    ----------
111    check_existing: bool, default True
112        If `False`, return a backtrack_interval of 0 minutes.
113
114    Returns
115    -------
116    The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
117    """
118    default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes')
119    configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None)
120    backtrack_minutes = (
121        configured_backtrack_minutes
122        if configured_backtrack_minutes is not None
123        else default_backtrack_minutes
124    ) if check_existing else 0
125
126    backtrack_interval = timedelta(minutes=backtrack_minutes)
127    dt_col = self.columns.get('datetime', None)
128    if dt_col is None:
129        return backtrack_interval
130
131    dt_dtype = self.dtypes.get(dt_col, 'datetime')
132    if 'int' in dt_dtype.lower():
133        return backtrack_minutes
134
135    return backtrack_interval

Get the chunk interval to use for this pipe.

Parameters
  • check_existing (bool, default True): If False, return a backtrack_interval of 0 minutes.
Returns
  • The backtrack interval (timedelta or int) to use with this pipe's datetime axis.
def get_data( self, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, str, NoneType] = None, end: Union[datetime.datetime, int, str, NoneType] = None, params: Optional[Dict[str, Any]] = None, as_iterator: bool = False, as_chunks: bool = False, as_dask: bool = False, add_missing_columns: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, order: Optional[str] = 'asc', limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame], NoneType]:
 23def get_data(
 24    self,
 25    select_columns: Optional[List[str]] = None,
 26    omit_columns: Optional[List[str]] = None,
 27    begin: Union[datetime, int, str, None] = None,
 28    end: Union[datetime, int, str, None] = None,
 29    params: Optional[Dict[str, Any]] = None,
 30    as_iterator: bool = False,
 31    as_chunks: bool = False,
 32    as_dask: bool = False,
 33    add_missing_columns: bool = False,
 34    chunk_interval: Union[timedelta, int, None] = None,
 35    order: Optional[str] = 'asc',
 36    limit: Optional[int] = None,
 37    fresh: bool = False,
 38    debug: bool = False,
 39    **kw: Any
 40) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
 41    """
 42    Get a pipe's data from the instance connector.
 43
 44    Parameters
 45    ----------
 46    select_columns: Optional[List[str]], default None
 47        If provided, only select these given columns.
 48        Otherwise select all available columns (i.e. `SELECT *`).
 49
 50    omit_columns: Optional[List[str]], default None
 51        If provided, remove these columns from the selection.
 52
 53    begin: Union[datetime, int, str, None], default None
 54        Lower bound datetime to begin searching for data (inclusive).
 55        Translates to a `WHERE` clause like `WHERE datetime >= begin`.
 56        Defaults to `None`.
 57
 58    end: Union[datetime, int, str, None], default None
 59        Upper bound datetime to stop searching for data (inclusive).
 60        Translates to a `WHERE` clause like `WHERE datetime < end`.
 61        Defaults to `None`.
 62
 63    params: Optional[Dict[str, Any]], default None
 64        Filter the retrieved data by a dictionary of parameters.
 65        See `meerschaum.utils.sql.build_where` for more details. 
 66
 67    as_iterator: bool, default False
 68        If `True`, return a generator of chunks of pipe data.
 69
 70    as_chunks: bool, default False
 71        Alias for `as_iterator`.
 72
 73    as_dask: bool, default False
 74        If `True`, return a `dask.DataFrame`
 75        (which may be loaded into a Pandas DataFrame with `df.compute()`).
 76
 77    add_missing_columns: bool, default False
 78        If `True`, add any missing columns from `Pipe.dtypes` to the dataframe.
 79
 80    chunk_interval: Union[timedelta, int, None], default None
 81        If `as_iterator`, then return chunks with `begin` and `end` separated by this interval.
 82        This may be set under `pipe.parameters['chunk_minutes']`.
 83        By default, use a timedelta of 1440 minutes (1 day).
 84        If `chunk_interval` is an integer and the `datetime` axis a timestamp,
 85        the use a timedelta with the number of minutes configured to this value.
 86        If the `datetime` axis is an integer, default to the configured chunksize.
 87        If `chunk_interval` is a `timedelta` and the `datetime` axis an integer,
 88        use the number of minutes in the `timedelta`.
 89
 90    order: Optional[str], default 'asc'
 91        If `order` is not `None`, sort the resulting dataframe by indices.
 92
 93    limit: Optional[int], default None
 94        If provided, cap the dataframe to this many rows.
 95
 96    fresh: bool, default False
 97        If `True`, skip local cache and directly query the instance connector.
 98
 99    debug: bool, default False
100        Verbosity toggle.
101        Defaults to `False`.
102
103    Returns
104    -------
105    A `pd.DataFrame` for the pipe's data corresponding to the provided parameters.
106
107    """
108    from meerschaum.utils.warnings import warn
109    from meerschaum.utils.venv import Venv
110    from meerschaum.connectors import get_connector_plugin
111    from meerschaum.utils.dtypes import to_pandas_dtype
112    from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator
113    from meerschaum.utils.packages import attempt_import
114    from meerschaum.utils.warnings import dprint
115    dd = attempt_import('dask.dataframe') if as_dask else None
116    dask = attempt_import('dask') if as_dask else None
117    _ = attempt_import('partd', lazy=False) if as_dask else None
118
119    if select_columns == '*':
120        select_columns = None
121    elif isinstance(select_columns, str):
122        select_columns = [select_columns]
123
124    if isinstance(omit_columns, str):
125        omit_columns = [omit_columns]
126
127    begin, end = self.parse_date_bounds(begin, end)
128    as_iterator = as_iterator or as_chunks
129    dt_col = self.columns.get('datetime', None)
130
131    def _sort_df(_df):
132        if df_is_chunk_generator(_df):
133            return _df
134        indices = [] if dt_col not in _df.columns else [dt_col]
135        non_dt_cols = [
136            col
137            for col_ix, col in self.columns.items()
138            if col_ix != 'datetime' and col in _df.columns
139        ]
140        indices.extend(non_dt_cols)
141        if 'dask' not in _df.__module__:
142            _df.sort_values(
143                by=indices,
144                inplace=True,
145                ascending=(str(order).lower() == 'asc'),
146            )
147            _df.reset_index(drop=True, inplace=True)
148        else:
149            _df = _df.sort_values(
150                by=indices,
151                ascending=(str(order).lower() == 'asc'),
152            )
153            _df = _df.reset_index(drop=True)
154        if limit is not None and len(_df) > limit:
155            return _df.head(limit)
156        return _df
157
158    if as_iterator or as_chunks:
159        df = self._get_data_as_iterator(
160            select_columns=select_columns,
161            omit_columns=omit_columns,
162            begin=begin,
163            end=end,
164            params=params,
165            chunk_interval=chunk_interval,
166            limit=limit,
167            order=order,
168            fresh=fresh,
169            debug=debug,
170        )
171        return _sort_df(df)
172
173    if as_dask:
174        from multiprocessing.pool import ThreadPool
175        dask_pool = ThreadPool(self.get_num_workers())
176        dask.config.set(pool=dask_pool)
177        chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
178        bounds = self.get_chunk_bounds(
179            begin=begin,
180            end=end,
181            bounded=False,
182            chunk_interval=chunk_interval,
183            debug=debug,
184        )
185        dask_chunks = [
186            dask.delayed(self.get_data)(
187                select_columns=select_columns,
188                omit_columns=omit_columns,
189                begin=chunk_begin,
190                end=chunk_end,
191                params=params,
192                chunk_interval=chunk_interval,
193                order=order,
194                limit=limit,
195                fresh=fresh,
196                add_missing_columns=True,
197                debug=debug,
198            )
199            for (chunk_begin, chunk_end) in bounds
200        ]
201        dask_meta = {
202            col: to_pandas_dtype(typ)
203            for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items()
204        }
205        if debug:
206            dprint(f"Dask meta:\n{dask_meta}")
207        return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta))
208
209    if not self.exists(debug=debug):
210        return None
211
212    with Venv(get_connector_plugin(self.instance_connector)):
213        df = self.instance_connector.get_pipe_data(
214            pipe=self,
215            select_columns=select_columns,
216            omit_columns=omit_columns,
217            begin=begin,
218            end=end,
219            params=params,
220            limit=limit,
221            order=order,
222            debug=debug,
223            **kw
224        )
225        if df is None:
226            return df
227
228        if not select_columns:
229            select_columns = [col for col in df.columns]
230
231        pipe_dtypes = self.get_dtypes(refresh=False, debug=debug)
232        cols_to_omit = [
233            col
234            for col in df.columns
235            if (
236                col in (omit_columns or [])
237                or
238                col not in (select_columns or [])
239            )
240        ]
241        cols_to_add = [
242            col
243            for col in select_columns
244            if col not in df.columns
245        ] + ([
246            col
247            for col in pipe_dtypes
248            if col not in df.columns
249        ] if add_missing_columns else [])
250        if cols_to_omit:
251            warn(
252                (
253                    f"Received {len(cols_to_omit)} omitted column"
254                    + ('s' if len(cols_to_omit) != 1 else '')
255                    + f" for {self}. "
256                    + "Consider adding `select_columns` and `omit_columns` support to "
257                    + f"'{self.instance_connector.type}' connectors to improve performance."
258                ),
259                stack=False,
260            )
261            _cols_to_select = [col for col in df.columns if col not in cols_to_omit]
262            df = df[_cols_to_select]
263
264        if cols_to_add:
265            if not add_missing_columns:
266                from meerschaum.utils.misc import items_str
267                warn(
268                    f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.",
269                    stack=False,
270                )
271
272            df = add_missing_cols_to_df(
273                df,
274                {
275                    col: pipe_dtypes.get(col, 'string')
276                    for col in cols_to_add
277                },
278            )
279
280        enforced_df = self.enforce_dtypes(
281            df,
282            dtypes=pipe_dtypes,
283            debug=debug,
284        )
285
286        if order:
287            return _sort_df(enforced_df)
288        return enforced_df

Get a pipe's data from the instance connector.

Parameters
  • select_columns (Optional[List[str]], default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
  • begin (Union[datetime, int, str, None], default None): Lower bound datetime to begin searching for data (inclusive). Translates to a WHERE clause like WHERE datetime >= begin. Defaults to None.
  • end (Union[datetime, int, str, None], default None): Upper bound datetime to stop searching for data (inclusive). Translates to a WHERE clause like WHERE datetime < end. Defaults to None.
  • params (Optional[Dict[str, Any]], default None): Filter the retrieved data by a dictionary of parameters. See meerschaum.utils.sql.build_where for more details.
  • as_iterator (bool, default False): If True, return a generator of chunks of pipe data.
  • as_chunks (bool, default False): Alias for as_iterator.
  • as_dask (bool, default False): If True, return a dask.DataFrame (which may be loaded into a Pandas DataFrame with df.compute()).
  • add_missing_columns (bool, default False): If True, add any missing columns from Pipe.dtypes to the dataframe.
  • chunk_interval (Union[timedelta, int, None], default None): If as_iterator, then return chunks with begin and end separated by this interval. This may be set under pipe.parameters['chunk_minutes']. By default, use a timedelta of 1440 minutes (1 day). If chunk_interval is an integer and the datetime axis a timestamp, the use a timedelta with the number of minutes configured to this value. If the datetime axis is an integer, default to the configured chunksize. If chunk_interval is a timedelta and the datetime axis an integer, use the number of minutes in the timedelta.
  • order (Optional[str], default 'asc'): If order is not None, sort the resulting dataframe by indices.
  • limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
  • fresh (bool, default False): If True, skip local cache and directly query the instance connector.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
  • A pd.DataFrame for the pipe's data corresponding to the provided parameters.
def get_backtrack_data( self, backtrack_minutes: Optional[int] = None, begin: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Optional[pandas.core.frame.DataFrame]:
380def get_backtrack_data(
381    self,
382    backtrack_minutes: Optional[int] = None,
383    begin: Union[datetime, int, None] = None,
384    params: Optional[Dict[str, Any]] = None,
385    limit: Optional[int] = None,
386    fresh: bool = False,
387    debug: bool = False,
388    **kw: Any
389) -> Optional['pd.DataFrame']:
390    """
391    Get the most recent data from the instance connector as a Pandas DataFrame.
392
393    Parameters
394    ----------
395    backtrack_minutes: Optional[int], default None
396        How many minutes from `begin` to select from.
397        If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`.
398
399    begin: Optional[datetime], default None
400        The starting point to search for data.
401        If begin is `None` (default), use the most recent observed datetime
402        (AKA sync_time).
403
404        ```
405        E.g. begin = 02:00
406
407        Search this region.           Ignore this, even if there's data.
408        /  /  /  /  /  /  /  /  /  |
409        -----|----------|----------|----------|----------|----------|
410        00:00      01:00      02:00      03:00      04:00      05:00
411
412        ```
413
414    params: Optional[Dict[str, Any]], default None
415        The standard Meerschaum `params` query dictionary.
416
417    limit: Optional[int], default None
418        If provided, cap the number of rows to be returned.
419
420    fresh: bool, default False
421        If `True`, Ignore local cache and pull directly from the instance connector.
422        Only comes into effect if a pipe was created with `cache=True`.
423
424    debug: bool default False
425        Verbosity toggle.
426
427    Returns
428    -------
429    A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data
430    is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
431    """
432    from meerschaum.utils.warnings import warn
433    from meerschaum.utils.venv import Venv
434    from meerschaum.connectors import get_connector_plugin
435
436    if not self.exists(debug=debug):
437        return None
438
439    begin = self.parse_date_bounds(begin)
440
441    backtrack_interval = self.get_backtrack_interval(debug=debug)
442    if backtrack_minutes is None:
443        backtrack_minutes = (
444            (backtrack_interval.total_seconds() / 60)
445            if isinstance(backtrack_interval, timedelta)
446            else backtrack_interval
447        )
448
449    if hasattr(self.instance_connector, 'get_backtrack_data'):
450        with Venv(get_connector_plugin(self.instance_connector)):
451            return self.enforce_dtypes(
452                self.instance_connector.get_backtrack_data(
453                    pipe=self,
454                    begin=begin,
455                    backtrack_minutes=backtrack_minutes,
456                    params=params,
457                    limit=limit,
458                    debug=debug,
459                    **kw
460                ),
461                debug=debug,
462            )
463
464    if begin is None:
465        begin = self.get_sync_time(params=params, debug=debug)
466
467    backtrack_interval = (
468        timedelta(minutes=backtrack_minutes)
469        if isinstance(begin, datetime)
470        else backtrack_minutes
471    )
472    if begin is not None:
473        begin = begin - backtrack_interval
474
475    return self.get_data(
476        begin=begin,
477        params=params,
478        debug=debug,
479        limit=limit,
480        order=kw.get('order', 'desc'),
481        **kw
482    )

Get the most recent data from the instance connector as a Pandas DataFrame.

Parameters
  • backtrack_minutes (Optional[int], default None): How many minutes from begin to select from. If None, use pipe.parameters['fetch']['backtrack_minutes'].
  • begin (Optional[datetime], default None): The starting point to search for data. If begin is None (default), use the most recent observed datetime (AKA sync_time).

    E.g. begin = 02:00
    
    Search this region.           Ignore this, even if there's data.
    /  /  /  /  /  /  /  /  /  |
    -----|----------|----------|----------|----------|----------|
    00:00      01:00      02:00      03:00      04:00      05:00
    
    
  • params (Optional[Dict[str, Any]], default None): The standard Meerschaum params query dictionary.

  • limit (Optional[int], default None): If provided, cap the number of rows to be returned.
  • fresh (bool, default False): If True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created with cache=True.
  • debug (bool default False): Verbosity toggle.
Returns
  • A pd.DataFrame for the pipe's data corresponding to the provided parameters. Backtrack data
  • is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
def get_rowcount( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, remote: bool = False, debug: bool = False) -> int:
485def get_rowcount(
486    self,
487    begin: Union[datetime, int, None] = None,
488    end: Union[datetime, int, None] = None,
489    params: Optional[Dict[str, Any]] = None,
490    remote: bool = False,
491    debug: bool = False
492) -> int:
493    """
494    Get a Pipe's instance or remote rowcount.
495
496    Parameters
497    ----------
498    begin: Optional[datetime], default None
499        Count rows where datetime > begin.
500
501    end: Optional[datetime], default None
502        Count rows where datetime < end.
503
504    remote: bool, default False
505        Count rows from a pipe's remote source.
506        **NOTE**: This is experimental!
507
508    debug: bool, default False
509        Verbosity toggle.
510
511    Returns
512    -------
513    An `int` of the number of rows in the pipe corresponding to the provided parameters.
514    Returned 0 if the pipe does not exist.
515    """
516    from meerschaum.utils.warnings import warn
517    from meerschaum.utils.venv import Venv
518    from meerschaum.connectors import get_connector_plugin
519    from meerschaum.utils.misc import filter_keywords
520
521    begin, end = self.parse_date_bounds(begin, end)
522    connector = self.instance_connector if not remote else self.connector
523    try:
524        with Venv(get_connector_plugin(connector)):
525            if not hasattr(connector, 'get_pipe_rowcount'):
526                warn(
527                    f"Connectors of type '{connector.type}' "
528                    "do not implement `get_pipe_rowcount()`.",
529                    stack=False,
530                )
531                return 0
532            kwargs = filter_keywords(
533                connector.get_pipe_rowcount,
534                begin=begin,
535                end=end,
536                params=params,
537                remote=remote,
538                debug=debug,
539            )
540            if remote and 'remote' not in kwargs:
541                warn(
542                    f"Connectors of type '{connector.type}' do not support remote rowcounts.",
543                    stack=False,
544                )
545                return 0
546            rowcount = connector.get_pipe_rowcount(
547                self,
548                begin=begin,
549                end=end,
550                params=params,
551                remote=remote,
552                debug=debug,
553            )
554            if rowcount is None:
555                return 0
556            return rowcount
557    except AttributeError as e:
558        warn(e)
559        if remote:
560            return 0
561    warn(f"Failed to get a rowcount for {self}.")
562    return 0

Get a Pipe's instance or remote rowcount.

Parameters
  • begin (Optional[datetime], default None): Count rows where datetime > begin.
  • end (Optional[datetime], default None): Count rows where datetime < end.
  • remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
  • debug (bool, default False): Verbosity toggle.
Returns
  • An int of the number of rows in the pipe corresponding to the provided parameters.
  • Returned 0 if the pipe does not exist.
def get_doc(self, **kwargs) -> Optional[Dict[str, Any]]:
826def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]:
827    """
828    Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data().
829    Keywords arguments are passed to `Pipe.get_data()`.
830    """
831    from meerschaum.utils.warnings import warn
832    kwargs['limit'] = 1
833    try:
834        result_df = self.get_data(**kwargs)
835        if result_df is None or len(result_df) == 0:
836            return None
837        return result_df.reset_index(drop=True).iloc[0].to_dict()
838    except Exception as e:
839        warn(f"Failed to read value from {self}:\n{e}", stack=False)
840        return None

Convenience function to return a single row as a dictionary (or None) from Pipe.get_data(). Keywords arguments are passed toPipe.get_data()`.

def get_value( self, column: str, params: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any:
842def get_value(
843    self,
844    column: str,
845    params: Optional[Dict[str, Any]] = None,
846    **kwargs: Any
847) -> Any:
848    """
849    Convenience function to return a single value (or `None`) from `Pipe.get_data()`.
850    Keywords arguments are passed to `Pipe.get_data()`.
851    """
852    from meerschaum.utils.warnings import warn
853    kwargs['select_columns'] = [column]
854    kwargs['limit'] = 1
855    try:
856        result_df = self.get_data(params=params, **kwargs)
857        if result_df is None or len(result_df) == 0:
858            return None
859        if column not in result_df.columns:
860            raise ValueError(f"Column '{column}' was not included in the result set.")
861        return result_df[column][0]
862    except Exception as e:
863        warn(f"Failed to read value from {self}:\n{e}", stack=False)
864        return None

Convenience function to return a single value (or None) from Pipe.get_data(). Keywords arguments are passed to Pipe.get_data().

def get_chunk_interval( self, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False) -> Union[datetime.timedelta, int]:
565def get_chunk_interval(
566    self,
567    chunk_interval: Union[timedelta, int, None] = None,
568    debug: bool = False,
569) -> Union[timedelta, int]:
570    """
571    Get the chunk interval to use for this pipe.
572
573    Parameters
574    ----------
575    chunk_interval: Union[timedelta, int, None], default None
576        If provided, coerce this value into the correct type.
577        For example, if the datetime axis is an integer, then
578        return the number of minutes.
579
580    Returns
581    -------
582    The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
583    """
584    default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes')
585    configured_chunk_minutes = self.parameters.get('verify', {}).get('chunk_minutes', None)
586    chunk_minutes = (
587        (configured_chunk_minutes or default_chunk_minutes)
588        if chunk_interval is None
589        else (
590            chunk_interval
591            if isinstance(chunk_interval, int)
592            else int(chunk_interval.total_seconds() / 60)
593        )
594    )
595
596    dt_col = self.columns.get('datetime', None)
597    if dt_col is None:
598        return timedelta(minutes=chunk_minutes)
599
600    dt_dtype = self.dtypes.get(dt_col, 'datetime')
601    if 'int' in dt_dtype.lower():
602        return chunk_minutes
603    return timedelta(minutes=chunk_minutes)

Get the chunk interval to use for this pipe.

Parameters
  • chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type. For example, if the datetime axis is an integer, then return the number of minutes.
Returns
  • The chunk interval (timedelta or int) to use with this pipe's datetime axis.
def get_chunk_bounds( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, bounded: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False) -> List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]]:
606def get_chunk_bounds(
607    self,
608    begin: Union[datetime, int, None] = None,
609    end: Union[datetime, int, None] = None,
610    bounded: bool = False,
611    chunk_interval: Union[timedelta, int, None] = None,
612    debug: bool = False,
613) -> List[
614    Tuple[
615        Union[datetime, int, None],
616        Union[datetime, int, None],
617    ]
618]:
619    """
620    Return a list of datetime bounds for iterating over the pipe's `datetime` axis.
621
622    Parameters
623    ----------
624    begin: Union[datetime, int, None], default None
625        If provided, do not select less than this value.
626        Otherwise the first chunk will be unbounded.
627
628    end: Union[datetime, int, None], default None
629        If provided, do not select greater than or equal to this value.
630        Otherwise the last chunk will be unbounded.
631
632    bounded: bool, default False
633        If `True`, do not include `None` in the first chunk.
634
635    chunk_interval: Union[timedelta, int, None], default None
636        If provided, use this interval for the size of chunk boundaries.
637        The default value for this pipe may be set
638        under `pipe.parameters['verify']['chunk_minutes']`.
639
640    debug: bool, default False
641        Verbosity toggle.
642
643    Returns
644    -------
645    A list of chunk bounds (datetimes or integers).
646    If unbounded, the first and last chunks will include `None`.
647    """
648    from datetime import timedelta
649    from meerschaum.utils.dtypes import are_dtypes_equal
650    from meerschaum.utils.misc import interval_str
651    include_less_than_begin = not bounded and begin is None
652    include_greater_than_end = not bounded and end is None
653    if begin is None:
654        begin = self.get_sync_time(newest=False, debug=debug)
655    consolidate_end_chunk = False
656    if end is None:
657        end = self.get_sync_time(newest=True, debug=debug)
658        if end is not None and hasattr(end, 'tzinfo'):
659            end += timedelta(minutes=1)
660            consolidate_end_chunk = True
661        elif are_dtypes_equal(str(type(end)), 'int'):
662            end += 1
663            consolidate_end_chunk = True
664
665    if begin is None and end is None:
666        return [(None, None)]
667
668    begin, end = self.parse_date_bounds(begin, end)
669
670    if begin and end:
671        if begin >= end:
672            return (
673                [(begin, begin)]
674                if bounded
675                else [(begin, None)]
676            )
677        if end <= begin:
678            return (
679                [(end, end)]
680                if bounded
681                else [(None, begin)]
682            )
683
684    ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`.
685    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
686    
687    ### Build a list of tuples containing the chunk boundaries
688    ### so that we can sync multiple chunks in parallel.
689    ### Run `verify pipes --workers 1` to sync chunks in series.
690    chunk_bounds = []
691    begin_cursor = begin
692    num_chunks = 0
693    max_chunks = 1_000_000
694    while begin_cursor < end:
695        end_cursor = begin_cursor + chunk_interval
696        chunk_bounds.append((begin_cursor, end_cursor))
697        begin_cursor = end_cursor
698        num_chunks += 1
699        if num_chunks >= max_chunks:
700            raise ValueError(
701                f"Too many chunks of size '{interval_str(chunk_interval)}' "
702                f"between '{begin}' and '{end}'."
703            )
704
705    if num_chunks > 1 and consolidate_end_chunk:
706        last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2]
707        chunk_bounds = chunk_bounds[:-2]
708        chunk_bounds.append((second_last_bounds[0], last_bounds[1]))
709
710    ### The chunk interval might be too large.
711    if not chunk_bounds and end >= begin:
712        chunk_bounds = [(begin, end)]
713
714    ### Truncate the last chunk to the end timestamp.
715    if chunk_bounds[-1][1] > end:
716        chunk_bounds[-1] = (chunk_bounds[-1][0], end)
717
718    ### Pop the last chunk if its bounds are equal.
719    if chunk_bounds[-1][0] == chunk_bounds[-1][1]:
720        chunk_bounds = chunk_bounds[:-1]
721
722    if include_less_than_begin:
723        chunk_bounds = [(None, begin)] + chunk_bounds
724    if include_greater_than_end:
725        chunk_bounds = chunk_bounds + [(end, None)]
726
727    return chunk_bounds

Return a list of datetime bounds for iterating over the pipe's datetime axis.

Parameters
  • begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
  • end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
  • bounded (bool, default False): If True, do not include None in the first chunk.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this interval for the size of chunk boundaries. The default value for this pipe may be set under pipe.parameters['verify']['chunk_minutes'].
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of chunk bounds (datetimes or integers).
  • If unbounded, the first and last chunks will include None.
def get_chunk_bounds_batches( self, chunk_bounds: List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]], batchsize: Optional[int] = None, workers: Optional[int] = None, debug: bool = False) -> List[Tuple[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]], ...]]:
730def get_chunk_bounds_batches(
731    self,
732    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]],
733    batchsize: Optional[int] = None,
734    workers: Optional[int] = None,
735    debug: bool = False,
736) -> List[
737    Tuple[
738        Tuple[
739            Union[datetime, int, None],
740            Union[datetime, int, None],
741        ], ...
742    ]
743]:
744    """
745    Return a list of tuples of chunk bounds of size `batchsize`.
746
747    Parameters
748    ----------
749    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]
750        A list of chunk_bounds (see `Pipe.get_chunk_bounds()`).
751
752    batchsize: Optional[int], default None
753        How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`.
754
755    workers: Optional[int], default None
756        If `batchsize` is `None`, use this as the desired number of workers.
757        Passed to `Pipe.get_num_workers()`.
758
759    Returns
760    -------
761    A list of tuples of chunk bound tuples.
762    """
763    from meerschaum.utils.misc import iterate_chunks
764    
765    if batchsize is None:
766        batchsize = self.get_num_workers(workers=workers)
767
768    return [
769        tuple(
770            _batch_chunk_bounds
771            for _batch_chunk_bounds in batch
772            if _batch_chunk_bounds is not None
773        )
774        for batch in iterate_chunks(chunk_bounds, batchsize)
775        if batch
776    ]

Return a list of tuples of chunk bounds of size batchsize.

Parameters
  • chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]): A list of chunk_bounds (see Pipe.get_chunk_bounds()).
  • batchsize (Optional[int], default None): How many chunks to include in a batch. Defaults to Pipe.get_num_workers().
  • workers (Optional[int], default None): If batchsize is None, use this as the desired number of workers. Passed to Pipe.get_num_workers().
Returns
  • A list of tuples of chunk bound tuples.
def parse_date_bounds( self, *dt_vals: Union[datetime.datetime, int, NoneType]) -> Union[datetime.datetime, int, str, NoneType, Tuple[Union[datetime.datetime, int, str, NoneType]]]:
779def parse_date_bounds(self, *dt_vals: Union[datetime, int, None]) -> Union[
780    datetime,
781    int,
782    str,
783    None,
784    Tuple[Union[datetime, int, str, None]]
785]:
786    """
787    Given a date bound (begin, end), coerce a timezone if necessary.
788    """
789    from meerschaum.utils.misc import is_int
790    from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES
791    from meerschaum.utils.warnings import warn
792    dateutil_parser = mrsm.attempt_import('dateutil.parser')
793
794    def _parse_date_bound(dt_val):
795        if dt_val is None:
796            return None
797
798        if isinstance(dt_val, int):
799            return dt_val
800
801        if dt_val == '':
802            return ''
803
804        if is_int(dt_val):
805            return int(dt_val)
806
807        if isinstance(dt_val, str):
808            try:
809                dt_val = dateutil_parser.parse(dt_val)
810            except Exception as e:
811                warn(f"Could not parse '{dt_val}' as datetime:\n{e}")
812                return None
813
814        dt_col = self.columns.get('datetime', None)
815        dt_typ = str(self.dtypes.get(dt_col, 'datetime'))
816        if dt_typ == 'datetime':
817            dt_typ = MRSM_PD_DTYPES['datetime']
818        return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower()))
819
820    bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals)
821    if len(bounds) == 1:
822        return bounds[0]
823    return bounds

Given a date bound (begin, end), coerce a timezone if necessary.

def register(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
12def register(
13        self,
14        debug: bool = False,
15        **kw: Any
16    ) -> SuccessTuple:
17    """
18    Register a new Pipe along with its attributes.
19
20    Parameters
21    ----------
22    debug: bool, default False
23        Verbosity toggle.
24
25    kw: Any
26        Keyword arguments to pass to `instance_connector.register_pipe()`.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31    """
32    if self.temporary:
33        return False, "Cannot register pipes created with `temporary=True` (read-only)."
34
35    from meerschaum.utils.formatting import get_console
36    from meerschaum.utils.venv import Venv
37    from meerschaum.connectors import get_connector_plugin, custom_types
38    from meerschaum.config._patch import apply_patch_to_config
39
40    import warnings
41    with warnings.catch_warnings():
42        warnings.simplefilter('ignore')
43        try:
44            _conn = self.connector
45        except Exception as e:
46            _conn = None
47
48    if (
49        _conn is not None
50        and
51        (_conn.type == 'plugin' or _conn.type in custom_types)
52        and
53        getattr(_conn, 'register', None) is not None
54    ):
55        try:
56            with Venv(get_connector_plugin(_conn), debug=debug):
57                params = self.connector.register(self)
58        except Exception as e:
59            get_console().print_exception()
60            params = None
61        params = {} if params is None else params
62        if not isinstance(params, dict):
63            from meerschaum.utils.warnings import warn
64            warn(
65                f"Invalid parameters returned from `register()` in connector {self.connector}:\n"
66                + f"{params}"
67            )
68        else:
69            self.parameters = apply_patch_to_config(params, self.parameters)
70
71    if not self.parameters:
72        cols = self.columns if self.columns else {'datetime': None, 'id': None}
73        self.parameters = {
74            'columns': cols,
75        }
76
77    with Venv(get_connector_plugin(self.instance_connector)):
78        return self.instance_connector.register_pipe(self, debug=debug, **kw)

Register a new Pipe along with its attributes.

Parameters
  • debug (bool, default False): Verbosity toggle.
  • kw (Any): Keyword arguments to pass to instance_connector.register_pipe().
Returns
attributes: Dict[str, Any]
20@property
21def attributes(self) -> Dict[str, Any]:
22    """
23    Return a dictionary of a pipe's keys and parameters.
24    These values are reflected directly from the pipes table of the instance.
25    """
26    from meerschaum.config import get_config
27    from meerschaum.config._patch import apply_patch_to_config
28    from meerschaum.utils.venv import Venv
29    from meerschaum.connectors import get_connector_plugin
30    from meerschaum.utils.dtypes import get_current_timestamp
31
32    timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds')
33
34    now = get_current_timestamp('ms', as_int=True) / 1000
35    _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug)
36    timed_out = (
37        _attributes_sync_time is None
38        or
39        (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds)
40    )
41    if not self.temporary and timed_out:
42        self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug)
43        local_attributes = self._get_cached_value('attributes', debug=self.debug) or {}
44        with Venv(get_connector_plugin(self.instance_connector)):
45            instance_attributes = self.instance_connector.get_pipe_attributes(self)
46
47        self._cache_value(
48            'attributes',
49            apply_patch_to_config(instance_attributes, local_attributes),
50            memory_only=True,
51            debug=self.debug,
52        )
53
54    return self._attributes

Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.

parameters: Optional[Dict[str, Any]]
134@property
135def parameters(self) -> Optional[Dict[str, Any]]:
136    """
137    Return the parameters dictionary of the pipe.
138    """
139    return self.get_parameters(debug=self.debug)

Return the parameters dictionary of the pipe.

columns: Optional[Dict[str, str]]
151@property
152def columns(self) -> Union[Dict[str, str], None]:
153    """
154    Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`.
155    """
156    cols = self.parameters.get('columns', {})
157    if not isinstance(cols, dict):
158        return {}
159    return {col_ix: col for col_ix, col in cols.items() if col and col_ix}

Return the columns dictionary defined in meerschaum.Pipe.parameters.

indices: Optional[Dict[str, Union[str, List[str]]]]
176@property
177def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]:
178    """
179    Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`.
180    """
181    _parameters = self.get_parameters(debug=self.debug)
182    indices_key = (
183        'indexes'
184        if 'indexes' in _parameters
185        else 'indices'
186    )
187
188    _indices = _parameters.get(indices_key, {})
189    _columns = self.columns
190    dt_col = _columns.get('datetime', None)
191    if not isinstance(_indices, dict):
192        _indices = {}
193    unique_cols = list(set((
194        [dt_col]
195        if dt_col
196        else []
197    ) + [
198        col
199        for col_ix, col in _columns.items()
200        if col and col_ix != 'datetime'
201    ]))
202    return {
203        **({'unique': unique_cols} if len(unique_cols) > 1 else {}),
204        **{col_ix: col for col_ix, col in _columns.items() if col},
205        **_indices
206    }

Return the indices dictionary defined in meerschaum.Pipe.parameters.

indexes: Optional[Dict[str, Union[str, List[str]]]]
209@property
210def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]:
211    """
212    Alias for `meerschaum.Pipe.indices`.
213    """
214    return self.indices
dtypes: Dict[str, Any]
265@property
266def dtypes(self) -> Dict[str, Any]:
267    """
268    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
269    """
270    return self.get_dtypes(refresh=False, debug=self.debug)

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

autoincrement: bool
369@property
370def autoincrement(self) -> bool:
371    """
372    Return the `autoincrement` parameter for the pipe.
373    """
374    return self.parameters.get('autoincrement', False)

Return the autoincrement parameter for the pipe.

autotime: bool
385@property
386def autotime(self) -> bool:
387    """
388    Return the `autotime` parameter for the pipe.
389    """
390    return self.parameters.get('autotime', False)

Return the autotime parameter for the pipe.

upsert: bool
336@property
337def upsert(self) -> bool:
338    """
339    Return whether `upsert` is set for the pipe.
340    """
341    return self.parameters.get('upsert', False)

Return whether upsert is set for the pipe.

static: bool
352@property
353def static(self) -> bool:
354    """
355    Return whether `static` is set for the pipe.
356    """
357    return self.parameters.get('static', False)

Return whether static is set for the pipe.

tzinfo: Optional[datetime.timezone]
401@property
402def tzinfo(self) -> Union[None, timezone]:
403    """
404    Return `timezone.utc` if the pipe is timezone-aware.
405    """
406    _tzinfo = self._get_cached_value('tzinfo', debug=self.debug)
407    if _tzinfo is not None:
408        return _tzinfo if _tzinfo != 'None' else None
409
410    _tzinfo = None
411    dt_col = self.columns.get('datetime', None)
412    dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None
413    if self.autotime:
414        ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
415        ts_typ = self.dtypes.get(ts_col, 'datetime')
416        dt_typ = ts_typ
417
418    if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime':
419        _tzinfo = timezone.utc
420
421    self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug)
422    return _tzinfo

Return timezone.utc if the pipe is timezone-aware.

enforce: bool
425@property
426def enforce(self) -> bool:
427    """
428    Return the `enforce` parameter for the pipe.
429    """
430    return self.parameters.get('enforce', True)

Return the enforce parameter for the pipe.

null_indices: bool
441@property
442def null_indices(self) -> bool:
443    """
444    Return the `null_indices` parameter for the pipe.
445    """
446    return self.parameters.get('null_indices', True)

Return the null_indices parameter for the pipe.

mixed_numerics: bool
457@property
458def mixed_numerics(self) -> bool:
459    """
460    Return the `mixed_numerics` parameter for the pipe.
461    """
462    return self.parameters.get('mixed_numerics', True)

Return the mixed_numerics parameter for the pipe.

def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
473def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
474    """
475    Check if the requested columns are defined.
476
477    Parameters
478    ----------
479    *args: str
480        The column names to be retrieved.
481
482    error: bool, default False
483        If `True`, raise an `Exception` if the specified column is not defined.
484
485    Returns
486    -------
487    A tuple of the same size of `args` or a `str` if `args` is a single argument.
488
489    Examples
490    --------
491    >>> pipe = mrsm.Pipe('test', 'test')
492    >>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
493    >>> pipe.get_columns('datetime', 'id')
494    ('dt', 'id')
495    >>> pipe.get_columns('value', error=True)
496    Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
497    """
498    from meerschaum.utils.warnings import error as _error
499    if not args:
500        args = tuple(self.columns.keys())
501    col_names = []
502    for col in args:
503        col_name = None
504        try:
505            col_name = self.columns[col]
506            if col_name is None and error:
507                _error(f"Please define the name of the '{col}' column for {self}.")
508        except Exception:
509            col_name = None
510        if col_name is None and error:
511            _error(f"Missing '{col}'" + f" column for {self}.")
512        col_names.append(col_name)
513    if len(col_names) == 1:
514        return col_names[0]
515    return tuple(col_names)

Check if the requested columns are defined.

Parameters
  • *args (str): The column names to be retrieved.
  • error (bool, default False): If True, raise an Exception if the specified column is not defined.
Returns
  • A tuple of the same size of args or a str if args is a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
def get_columns_types( self, refresh: bool = False, debug: bool = False) -> Optional[Dict[str, str]]:
518def get_columns_types(
519    self,
520    refresh: bool = False,
521    debug: bool = False,
522) -> Union[Dict[str, str], None]:
523    """
524    Get a dictionary of a pipe's column names and their types.
525
526    Parameters
527    ----------
528    refresh: bool, default False
529        If `True`, invalidate the cache and fetch directly from the instance connector.
530
531    debug: bool, default False:
532        Verbosity toggle.
533
534    Returns
535    -------
536    A dictionary of column names (`str`) to column types (`str`).
537
538    Examples
539    --------
540    >>> pipe.get_columns_types()
541    {
542      'dt': 'TIMESTAMP WITH TIMEZONE',
543      'id': 'BIGINT',
544      'val': 'DOUBLE PRECISION',
545    }
546    >>>
547    """
548    from meerschaum.connectors import get_connector_plugin
549    from meerschaum.utils.dtypes import get_current_timestamp
550
551    now = get_current_timestamp('ms', as_int=True) / 1000
552    cache_seconds = (
553        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
554        if self.static
555        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
556    )
557    if refresh:
558        self._clear_cache_key('_columns_types_timestamp', debug=debug)
559        self._clear_cache_key('_columns_types', debug=debug)
560
561    _columns_types = self._get_cached_value('_columns_types', debug=debug)
562    if _columns_types:
563        columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug)
564        if columns_types_timestamp is not None:
565            delta = now - columns_types_timestamp
566            if delta < cache_seconds:
567                if debug:
568                    dprint(
569                        f"Returning cached `columns_types` for {self} "
570                        f"({round(delta, 2)} seconds old)."
571                    )
572                return _columns_types
573
574    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
575        _columns_types = (
576            self.instance_connector.get_pipe_columns_types(self, debug=debug)
577            if hasattr(self.instance_connector, 'get_pipe_columns_types')
578            else None
579        )
580
581    self._cache_value('_columns_types', _columns_types, debug=debug)
582    self._cache_value('_columns_types_timestamp', now, debug=debug)
583    return _columns_types or {}

Get a dictionary of a pipe's column names and their types.

Parameters
  • refresh (bool, default False): If True, invalidate the cache and fetch directly from the instance connector.
  • debug (bool, default False:): Verbosity toggle.
Returns
  • A dictionary of column names (str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
  'dt': 'TIMESTAMP WITH TIMEZONE',
  'id': 'BIGINT',
  'val': 'DOUBLE PRECISION',
}
>>>
def get_columns_indices( self, debug: bool = False, refresh: bool = False) -> Dict[str, List[Dict[str, str]]]:
586def get_columns_indices(
587    self,
588    debug: bool = False,
589    refresh: bool = False,
590) -> Dict[str, List[Dict[str, str]]]:
591    """
592    Return a dictionary mapping columns to index information.
593    """
594    from meerschaum.connectors import get_connector_plugin
595    from meerschaum.utils.dtypes import get_current_timestamp
596
597    now = get_current_timestamp('ms', as_int=True) / 1000
598    cache_seconds = (
599        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
600        if self.static
601        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
602    )
603    if refresh:
604        self._clear_cache_key('_columns_indices_timestamp', debug=debug)
605        self._clear_cache_key('_columns_indices', debug=debug)
606
607    _columns_indices = self._get_cached_value('_columns_indices', debug=debug)
608
609    if _columns_indices:
610        columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug)
611        if columns_indices_timestamp is not None:
612            delta = now - columns_indices_timestamp
613            if delta < cache_seconds:
614                if debug:
615                    dprint(
616                        f"Returning cached `columns_indices` for {self} "
617                        f"({round(delta, 2)} seconds old)."
618                    )
619                return _columns_indices
620
621    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
622        _columns_indices = (
623            self.instance_connector.get_pipe_columns_indices(self, debug=debug)
624            if hasattr(self.instance_connector, 'get_pipe_columns_indices')
625            else None
626        )
627
628    self._cache_value('_columns_indices', _columns_indices, debug=debug)
629    self._cache_value('_columns_indices_timestamp', now, debug=debug)
630    return {k: v for k, v in _columns_indices.items() if k and v} or {}

Return a dictionary mapping columns to index information.

def get_indices(self) -> Dict[str, str]:
886def get_indices(self) -> Dict[str, str]:
887    """
888    Return a dictionary mapping index keys to their names in the database.
889
890    Returns
891    -------
892    A dictionary of index keys to index names.
893    """
894    from meerschaum.connectors import get_connector_plugin
895    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
896        if hasattr(self.instance_connector, 'get_pipe_index_names'):
897            result = self.instance_connector.get_pipe_index_names(self)
898        else:
899            result = {}
900    
901    return result

Return a dictionary mapping index keys to their names in the database.

Returns
  • A dictionary of index keys to index names.
def get_parameters( self, apply_symlinks: bool = True, refresh: bool = False, debug: bool = False, _visited: Optional[set[Pipe]] = None) -> Dict[str, Any]:
 57def get_parameters(
 58    self,
 59    apply_symlinks: bool = True,
 60    refresh: bool = False,
 61    debug: bool = False,
 62    _visited: 'Optional[set[mrsm.Pipe]]' = None,
 63) -> Dict[str, Any]:
 64    """
 65    Return the `parameters` dictionary of the pipe.
 66
 67    Parameters
 68    ----------
 69    apply_symlinks: bool, default True
 70        If `True`, resolve references to parameters from other pipes.
 71
 72    refresh: bool, default False
 73        If `True`, pull the latest attributes for the pipe.
 74
 75    Returns
 76    -------
 77    The pipe's parameters dictionary.
 78    """
 79    from meerschaum.config._patch import apply_patch_to_config
 80    from meerschaum.config._read_config import search_and_substitute_config
 81
 82    if _visited is None:
 83        _visited = {self}
 84
 85    if refresh:
 86        _ = self._invalidate_cache(hard=True)
 87
 88    raw_parameters = self.attributes.get('parameters', {})
 89    ref_keys = raw_parameters.get('reference')
 90    if not apply_symlinks:
 91        return raw_parameters
 92
 93    if ref_keys:
 94        try:
 95            if debug:
 96                dprint(f"Building reference pipe from keys: {ref_keys}")
 97            ref_pipe = mrsm.Pipe(**ref_keys)
 98            if ref_pipe in _visited:
 99                warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.")
100                return search_and_substitute_config(raw_parameters)
101
102            _visited.add(ref_pipe)
103            base_params = ref_pipe.get_parameters(_visited=_visited, debug=debug)
104        except Exception as e:
105            warn(f"Failed to resolve reference pipe for {self}: {e}")
106            base_params = {}
107
108        params_to_apply = {k: v for k, v in raw_parameters.items() if k != 'reference'}
109        parameters = apply_patch_to_config(base_params, params_to_apply)
110    else:
111        parameters = raw_parameters
112
113    from meerschaum.utils.pipes import replace_pipes_syntax
114    self._symlinks = {}
115
116    def recursive_replace(obj: Any, path: tuple) -> Any:
117        if isinstance(obj, dict):
118            return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()}
119        if isinstance(obj, list):
120            return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)]
121        if isinstance(obj, str):
122            substituted_val = replace_pipes_syntax(obj)
123            if substituted_val != obj:
124                self._symlinks[path] = {
125                    'original': obj,
126                    'substituted': substituted_val,
127                }
128            return substituted_val
129        return obj
130
131    return search_and_substitute_config(recursive_replace(parameters, tuple()))

Return the parameters dictionary of the pipe.

Parameters
  • apply_symlinks (bool, default True): If True, resolve references to parameters from other pipes.
  • refresh (bool, default False): If True, pull the latest attributes for the pipe.
Returns
  • The pipe's parameters dictionary.
def get_dtypes( self, infer: bool = True, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
284def get_dtypes(
285    self,
286    infer: bool = True,
287    refresh: bool = False,
288    debug: bool = False,
289) -> Dict[str, Any]:
290    """
291    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
292
293    Parameters
294    ----------
295    infer: bool, default True
296        If `True`, include the implicit existing dtypes.
297        Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`).
298
299    refresh: bool, default False
300        If `True`, invalidate any cache and return the latest known dtypes.
301
302    Returns
303    -------
304    A dictionary mapping column names to dtypes.
305    """
306    from meerschaum.config._patch import apply_patch_to_config
307    from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES
308    parameters = self.get_parameters(refresh=refresh, debug=debug)
309    configured_dtypes = parameters.get('dtypes', {})
310    if debug:
311        dprint(f"Configured dtypes for {self}:")
312        mrsm.pprint(configured_dtypes)
313
314    remote_dtypes = (
315        self.infer_dtypes(persist=False, refresh=refresh, debug=debug)
316        if infer
317        else {}
318    )
319    patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {}))
320
321    dt_col = parameters.get('columns', {}).get('datetime', None)
322    primary_col = parameters.get('columns', {}).get('primary', None)
323    _dtypes = {
324        col: MRSM_ALIAS_DTYPES.get(typ, typ)
325        for col, typ in patched_dtypes.items()
326        if col and typ
327    }
328    if dt_col and dt_col not in configured_dtypes:
329        _dtypes[dt_col] = 'datetime'
330    if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes:
331        _dtypes[primary_col] = 'int'
332
333    return _dtypes

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

Parameters
  • infer (bool, default True): If True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g. Pipe.parameters['dtypes']).
  • refresh (bool, default False): If True, invalidate any cache and return the latest known dtypes.
Returns
  • A dictionary mapping column names to dtypes.
def update_parameters( self, parameters_patch: Dict[str, Any], persist: bool = True, debug: bool = False) -> Tuple[bool, str]:
904def update_parameters(
905    self,
906    parameters_patch: Dict[str, Any],
907    persist: bool = True,
908    debug: bool = False,
909) -> mrsm.SuccessTuple:
910    """
911    Apply a patch to a pipe's `parameters` dictionary.
912
913    Parameters
914    ----------
915    parameters_patch: Dict[str, Any]
916        The patch to be applied to `Pipe.parameters`.
917
918    persist: bool, default True
919        If `True`, call `Pipe.edit()` to persist the new parameters.
920    """
921    from meerschaum.config import apply_patch_to_config
922    if 'parameters' not in self._attributes:
923        self._attributes['parameters'] = {}
924
925    self._attributes['parameters'] = apply_patch_to_config(
926        self._attributes['parameters'],
927        parameters_patch,
928    )
929
930    if self.temporary:
931        persist = False
932
933    if not persist:
934        return True, "Success"
935
936    return self.edit(debug=debug)

Apply a patch to a pipe's parameters dictionary.

Parameters
  • parameters_patch (Dict[str, Any]): The patch to be applied to Pipe.parameters.
  • persist (bool, default True): If True, call Pipe.edit() to persist the new parameters.
tags: Optional[List[str]]
242@property
243def tags(self) -> Union[List[str], None]:
244    """
245    If defined, return the `tags` list defined in `meerschaum.Pipe.parameters`.
246    """
247    return self.parameters.get('tags', [])

If defined, return the tags list defined in meerschaum.Pipe.parameters.

def get_id(self, **kw: Any) -> Union[int, str, NoneType]:
633def get_id(self, **kw: Any) -> Union[int, str, None]:
634    """
635    Fetch a pipe's ID from its instance connector.
636    If the pipe is not registered, return `None`.
637    """
638    if self.temporary:
639        return None
640
641    from meerschaum.utils.venv import Venv
642    from meerschaum.connectors import get_connector_plugin
643
644    with Venv(get_connector_plugin(self.instance_connector)):
645        if hasattr(self.instance_connector, 'get_pipe_id'):
646            return self.instance_connector.get_pipe_id(self, **kw)
647
648    return None

Fetch a pipe's ID from its instance connector. If the pipe is not registered, return None.

id: Union[int, str, uuid.UUID, NoneType]
651@property
652def id(self) -> Union[int, str, uuid.UUID, None]:
653    """
654    Fetch and cache a pipe's ID.
655    """
656    _id = self._get_cached_value('_id', debug=self.debug)
657    if not _id:
658        _id = self.get_id(debug=self.debug)
659        if _id is not None:
660            self._cache_value('_id', _id, debug=self.debug)
661    return _id

Fetch and cache a pipe's ID.

def get_val_column(self, debug: bool = False) -> Optional[str]:
664def get_val_column(self, debug: bool = False) -> Union[str, None]:
665    """
666    Return the name of the value column if it's defined, otherwise make an educated guess.
667    If not set in the `columns` dictionary, return the first numeric column that is not
668    an ID or datetime column.
669    If none may be found, return `None`.
670
671    Parameters
672    ----------
673    debug: bool, default False:
674        Verbosity toggle.
675
676    Returns
677    -------
678    Either a string or `None`.
679    """
680    if debug:
681        dprint('Attempting to determine the value column...')
682    try:
683        val_name = self.get_columns('value')
684    except Exception:
685        val_name = None
686    if val_name is not None:
687        if debug:
688            dprint(f"Value column: {val_name}")
689        return val_name
690
691    cols = self.columns
692    if cols is None:
693        if debug:
694            dprint('No columns could be determined. Returning...')
695        return None
696    try:
697        dt_name = self.get_columns('datetime', error=False)
698    except Exception:
699        dt_name = None
700    try:
701        id_name = self.get_columns('id', errors=False)
702    except Exception:
703        id_name = None
704
705    if debug:
706        dprint(f"dt_name: {dt_name}")
707        dprint(f"id_name: {id_name}")
708
709    cols_types = self.get_columns_types(debug=debug)
710    if cols_types is None:
711        return None
712    if debug:
713        dprint(f"cols_types: {cols_types}")
714    if dt_name is not None:
715        cols_types.pop(dt_name, None)
716    if id_name is not None:
717        cols_types.pop(id_name, None)
718
719    candidates = []
720    candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',}
721    for search_term in candidate_keywords:
722        for col, typ in cols_types.items():
723            if search_term in typ.lower():
724                candidates.append(col)
725                break
726    if not candidates:
727        if debug:
728            dprint("No value column could be determined.")
729        return None
730
731    return candidates[0]

Return the name of the value column if it's defined, otherwise make an educated guess. If not set in the columns dictionary, return the first numeric column that is not an ID or datetime column. If none may be found, return None.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
  • Either a string or None.
parents: List[Pipe]
734@property
735def parents(self) -> List[mrsm.Pipe]:
736    """
737    Return a list of `meerschaum.Pipe` objects to be designated as parents.
738    """
739    if 'parents' not in self.parameters:
740        return []
741
742    from meerschaum.utils.warnings import warn
743    _parents_keys = self.parameters['parents']
744    if not isinstance(_parents_keys, list):
745        warn(
746            f"Please ensure the parents for {self} are defined as a list of keys.",
747            stacklevel = 4
748        )
749        return []
750    from meerschaum import Pipe
751    _parents = []
752    for keys in _parents_keys:
753        try:
754            p = Pipe(**keys)
755        except Exception as e:
756            warn(f"Unable to build parent with keys '{keys}' for {self}:\n{e}")
757            continue
758        _parents.append(p)
759    return _parents

Return a list of meerschaum.Pipe objects to be designated as parents.

parent: Optional[Pipe]
762@property
763def parent(self) -> Union[mrsm.Pipe, None]:
764    """
765    Return the first pipe in `self.parents` or `None`.
766    """
767    parents = self.parents
768    if not parents:
769        return None
770    return parents[0]

Return the first pipe in self.parents or None.

children: List[Pipe]
773@property
774def children(self) -> List[mrsm.Pipe]:
775    """
776    Return a list of `meerschaum.Pipe` objects to be designated as children.
777    """
778    if 'children' not in self.parameters:
779        return []
780
781    from meerschaum.utils.warnings import warn
782    _children_keys = self.parameters['children']
783    if not isinstance(_children_keys, list):
784        warn(
785            f"Please ensure the children for {self} are defined as a list of keys.",
786            stacklevel = 4
787        )
788        return []
789    from meerschaum import Pipe
790    _children = []
791    for keys in _children_keys:
792        try:
793            p = Pipe(**keys)
794        except Exception as e:
795            warn(f"Unable to build parent with keys '{keys}' for {self}:\n{e}")
796            continue
797        _children.append(p)
798    return _children

Return a list of meerschaum.Pipe objects to be designated as children.

target: str
801@property
802def target(self) -> str:
803    """
804    The target table name.
805    You can set the target name under on of the following keys
806    (checked in this order):
807      - `target`
808      - `target_name`
809      - `target_table`
810      - `target_table_name`
811    """
812    if 'target' not in self.parameters:
813        default_target = self._target_legacy()
814        default_targets = {default_target}
815        potential_keys = ('target_name', 'target_table', 'target_table_name')
816        _target = None
817        for k in potential_keys:
818            if k in self.parameters:
819                _target = self.parameters[k]
820                break
821
822        _target = _target or default_target
823
824        if self.instance_connector.type == 'sql':
825            from meerschaum.utils.sql import truncate_item_name
826            truncated_target = truncate_item_name(_target, self.instance_connector.flavor)
827            default_targets.add(truncated_target)
828            warned_target = self.__dict__.get('_warned_target', False)
829            if truncated_target != _target and not warned_target:
830                if not warned_target:
831                    warn(
832                        f"The target '{_target}' is too long for '{self.instance_connector.flavor}', "
833                        + f"will use {truncated_target} instead."
834                    )
835                    self.__dict__['_warned_target'] = True
836                _target = truncated_target
837
838        if _target in default_targets:
839            return _target
840        self.target = _target
841    return self.parameters['target']

The target table name. You can set the target name under on of the following keys (checked in this order):

  • target
  • target_name
  • target_table
  • target_table_name
def guess_datetime(self) -> Optional[str]:
864def guess_datetime(self) -> Union[str, None]:
865    """
866    Try to determine a pipe's datetime column.
867    """
868    _dtypes = self.dtypes
869
870    ### Abort if the user explictly disallows a datetime index.
871    if 'datetime' in _dtypes:
872        if _dtypes['datetime'] is None:
873            return None
874
875    from meerschaum.utils.dtypes import are_dtypes_equal
876    dt_cols = [
877        col
878        for col, typ in _dtypes.items()
879        if are_dtypes_equal(typ, 'datetime')
880    ]
881    if not dt_cols:
882        return None
883    return dt_cols[0]

Try to determine a pipe's datetime column.

precision: Dict[str, Union[str, int]]
1028@property
1029def precision(self) -> Dict[str, Union[str, int]]:
1030    """
1031    Return the configured or detected precision.
1032    """
1033    return self.get_precision(debug=self.debug)

Return the configured or detected precision.

def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
 939def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
 940    """
 941    Return the timestamp precision unit and interval for the `datetime` axis.
 942    """
 943    from meerschaum.utils.dtypes import (
 944        MRSM_PRECISION_UNITS_SCALARS,
 945        MRSM_PRECISION_UNITS_ALIASES,
 946        MRSM_PD_DTYPES,
 947        are_dtypes_equal,
 948    )
 949    from meerschaum._internal.static import STATIC_CONFIG
 950
 951    _precision = self._get_cached_value('precision', debug=debug)
 952    if _precision:
 953        if debug:
 954            dprint(f"Returning cached precision: {_precision}")
 955        return _precision
 956
 957    parameters = self.parameters
 958    _precision = parameters.get('precision', {})
 959    if isinstance(_precision, str):
 960        _precision = {'unit': _precision}
 961    default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
 962
 963    if not _precision:
 964
 965        dt_col = parameters.get('columns', {}).get('datetime', None)
 966        if not dt_col and self.autotime:
 967            dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
 968        if not dt_col:
 969            if debug:
 970                dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.")
 971            return {'unit': default_precision_unit}
 972
 973        dt_typ = self.dtypes.get(dt_col, 'datetime')
 974        if are_dtypes_equal(dt_typ, 'datetime'):
 975            if dt_typ == 'datetime':
 976                dt_typ = MRSM_PD_DTYPES['datetime']
 977                if debug:
 978                    dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.")
 979
 980            _precision = {
 981                'unit': (
 982                    dt_typ
 983                    .split('[', maxsplit=1)[-1]
 984                    .split(',', maxsplit=1)[0]
 985                    .split(' ', maxsplit=1)[0]
 986                ).rstrip(']')
 987            }
 988
 989            if debug:
 990                dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.")
 991
 992        elif are_dtypes_equal(dt_typ, 'int'):
 993            _precision = {
 994                'unit': (
 995                    'second'
 996                    if '32' in dt_typ
 997                    else default_precision_unit
 998                )
 999            }
1000        elif are_dtypes_equal(dt_typ, 'date'):
1001            if debug:
1002                dprint("Datetime axis is 'date', falling back to 'day' precision.")
1003            _precision = {'unit': 'day'}
1004
1005    precision_unit = _precision.get('unit', default_precision_unit)
1006    precision_interval = _precision.get('interval', None)
1007    true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
1008    if true_precision_unit is None:
1009        if debug:
1010            dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.")
1011        true_precision_unit = default_precision_unit
1012
1013    if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS:
1014        from meerschaum.utils.misc import items_str
1015        raise ValueError(
1016            f"Invalid precision unit '{true_precision_unit}'.\n"
1017            "Accepted values are "
1018            f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}."
1019        )
1020
1021    _precision = {'unit': true_precision_unit}
1022    if precision_interval:
1023        _precision['interval'] = precision_interval
1024    self._cache_value('precision', _precision, debug=debug)
1025    return self._precision

Return the timestamp precision unit and interval for the datetime axis.

def show( self, nopretty: bool = False, debug: bool = False, **kw) -> Tuple[bool, str]:
12def show(
13    self,
14    nopretty: bool = False,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Show attributes of a Pipe.
20
21    Parameters
22    ----------
23    nopretty: bool, default False
24        If `True`, simply print the JSON of the pipe's attributes.
25
26    debug: bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success, message.
32
33    """
34    import json
35    from meerschaum.utils.formatting import (
36        pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console,
37    )
38    from meerschaum.utils.packages import import_rich, attempt_import
39    from meerschaum.utils.warnings import info
40    attributes_json = json.dumps(self.attributes)
41    if not nopretty:
42        _to_print = f"Attributes for {self}:"
43        if ANSI:
44            _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta')
45            print(_to_print)
46            rich = import_rich()
47            rich_json = attempt_import('rich.json')
48            get_console().print(rich_json.JSON(attributes_json))
49        else:
50            print(_to_print)
51    else:
52        print(attributes_json)
53
54    return True, "Success"

Show attributes of a Pipe.

Parameters
  • nopretty (bool, default False): If True, simply print the JSON of the pipe's attributes.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit( self, patch: bool = False, interactive: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 21def edit(
 22    self,
 23    patch: bool = False,
 24    interactive: bool = False,
 25    debug: bool = False,
 26    **kw: Any
 27) -> SuccessTuple:
 28    """
 29    Edit a Pipe's configuration.
 30
 31    Parameters
 32    ----------
 33    patch: bool, default False
 34        If `patch` is True, update parameters by cascading rather than overwriting.
 35    interactive: bool, default False
 36        If `True`, open an editor for the user to make changes to the pipe's YAML file.
 37    debug: bool, default False
 38        Verbosity toggle.
 39
 40    Returns
 41    -------
 42    A `SuccessTuple` of success, message.
 43
 44    """
 45    from meerschaum.utils.venv import Venv
 46    from meerschaum.connectors import get_connector_plugin
 47
 48    if self.temporary:
 49        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
 50
 51    self._invalidate_cache(hard=True, debug=debug)
 52
 53    if hasattr(self, '_symlinks'):
 54        from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path
 55        for path, vals in self._symlinks.items():
 56            current_val = get_val_from_dict_path(self.parameters, path)
 57            if current_val == vals['substituted']:
 58                set_val_in_dict_path(self.parameters, path, vals['original'])
 59
 60    if not interactive:
 61        with Venv(get_connector_plugin(self.instance_connector)):
 62            return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
 63
 64    from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH
 65    from meerschaum.utils.misc import edit_file
 66    parameters_filename = str(self) + '.yaml'
 67    parameters_path = PIPES_CACHE_RESOURCES_PATH / parameters_filename
 68
 69    from meerschaum.utils.yaml import yaml
 70
 71    edit_text = f"Edit the parameters for {self}"
 72    edit_top = '#' * (len(edit_text) + 4)
 73    edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n'
 74
 75    from meerschaum.config import get_config
 76    parameters = dict(get_config('pipes', 'parameters', patch=True))
 77    from meerschaum.config._patch import apply_patch_to_config
 78    raw_parameters = self.attributes.get('parameters', {})
 79    parameters = apply_patch_to_config(parameters, raw_parameters)
 80
 81    ### write parameters to yaml file
 82    with open(parameters_path, 'w+') as f:
 83        f.write(edit_header)
 84        yaml.dump(parameters, stream=f, sort_keys=False)
 85
 86    ### only quit editing if yaml is valid
 87    editing = True
 88    while editing:
 89        edit_file(parameters_path)
 90        try:
 91            with open(parameters_path, 'r') as f:
 92                file_parameters = yaml.load(f.read())
 93        except Exception as e:
 94            from meerschaum.utils.warnings import warn
 95            warn(f"Invalid format defined for '{self}':\n\n{e}")
 96            input(f"Press [Enter] to correct the configuration for '{self}': ")
 97        else:
 98            editing = False
 99
100    self.parameters = file_parameters
101
102    if debug:
103        from meerschaum.utils.formatting import pprint
104        pprint(self.parameters)
105
106    with Venv(get_connector_plugin(self.instance_connector)):
107        return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)

Edit a Pipe's configuration.

Parameters
  • patch (bool, default False): If patch is True, update parameters by cascading rather than overwriting.
  • interactive (bool, default False): If True, open an editor for the user to make changes to the pipe's YAML file.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit_definition( self, yes: bool = False, noask: bool = False, force: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
110def edit_definition(
111    self,
112    yes: bool = False,
113    noask: bool = False,
114    force: bool = False,
115    debug : bool = False,
116    **kw : Any
117) -> SuccessTuple:
118    """
119    Edit a pipe's definition file and update its configuration.
120    **NOTE:** This function is interactive and should not be used in automated scripts!
121
122    Returns
123    -------
124    A `SuccessTuple` of success, message.
125
126    """
127    if self.temporary:
128        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
129
130    from meerschaum.connectors import instance_types
131    if (self.connector is None) or self.connector.type not in instance_types:
132        return self.edit(interactive=True, debug=debug, **kw)
133
134    import json
135    from meerschaum.utils.warnings import info, warn
136    from meerschaum.utils.debug import dprint
137    from meerschaum.config._patch import apply_patch_to_config
138    from meerschaum.utils.misc import edit_file
139
140    _parameters = self.parameters
141    if 'fetch' not in _parameters:
142        _parameters['fetch'] = {}
143
144    def _edit_api():
145        from meerschaum.utils.prompt import prompt, yes_no
146        info(
147            f"Please enter the keys of the source pipe from '{self.connector}'.\n" +
148            "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip."
149        )
150
151        _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None }
152        for k in _keys:
153            _keys[k] = _parameters['fetch'].get(k, None)
154
155        for k, v in _keys.items():
156            try:
157                _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v)
158            except KeyboardInterrupt:
159                continue
160            if _keys[k] in ('', 'None', '\'None\'', '[None]'):
161                _keys[k] = None
162
163        _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys)
164
165        info("You may optionally specify additional filter parameters as JSON.")
166        print("  Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.")
167        print("  For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':")
168        print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': ')))
169        if force or yes_no(
170            "Would you like to add additional filter parameters?",
171            yes=yes, noask=noask
172        ):
173            from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH
174            definition_filename = str(self) + '.json'
175            definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename
176            try:
177                definition_path.touch()
178                with open(definition_path, 'w+') as f:
179                    json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2)
180            except Exception as e:
181                return False, f"Failed writing file '{definition_path}':\n" + str(e)
182
183            _params = None
184            while True:
185                edit_file(definition_path)
186                try:
187                    with open(definition_path, 'r') as f:
188                        _params = json.load(f)
189                except Exception as e:
190                    warn(f'Failed to read parameters JSON:\n{e}', stack=False)
191                    if force or yes_no(
192                        "Would you like to try again?\n  "
193                        + "If not, the parameters JSON file will be ignored.",
194                        noask=noask, yes=yes
195                    ):
196                        continue
197                    _params = None
198                break
199            if _params is not None:
200                if 'fetch' not in _parameters:
201                    _parameters['fetch'] = {}
202                _parameters['fetch']['params'] = _params
203
204        self.parameters = _parameters
205        return True, "Success"
206
207    def _edit_sql():
208        import textwrap
209        from meerschaum.config._paths import PIPES_CACHE_RESOURCES_PATH
210        from meerschaum.utils.misc import edit_file
211        definition_filename = str(self) + '.sql'
212        definition_path = PIPES_CACHE_RESOURCES_PATH / definition_filename
213
214        sql_definition = _parameters['fetch'].get('definition', None)
215        if sql_definition is None:
216            sql_definition = ''
217        sql_definition = textwrap.dedent(sql_definition).lstrip()
218
219        try:
220            definition_path.touch()
221            with open(definition_path, 'w+') as f:
222                f.write(sql_definition)
223        except Exception as e:
224            return False, f"Failed writing file '{definition_path}':\n" + str(e)
225
226        edit_file(definition_path)
227        try:
228            with open(definition_path, 'r', encoding='utf-8') as f:
229                file_definition = f.read()
230        except Exception as e:
231            return False, f"Failed reading file '{definition_path}':\n" + str(e)
232
233        if sql_definition == file_definition:
234            return False, f"No changes made to definition for {self}."
235
236        if ' ' not in file_definition:
237            return False, f"Invalid SQL definition for {self}."
238
239        if debug:
240            dprint("Read SQL definition:\n\n" + file_definition)
241        _parameters['fetch']['definition'] = file_definition
242        self.parameters = _parameters
243        return True, "Success"
244
245    locals()['_edit_' + str(self.connector.type)]()
246    return self.edit(interactive=False, debug=debug, **kw)

Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!

Returns
def update(self, *args, **kw) -> Tuple[bool, str]:
13def update(self, *args, **kw) -> SuccessTuple:
14    """
15    Update a pipe's parameters in its instance.
16    """
17    kw['interactive'] = False
18    return self.edit(*args, **kw)

Update a pipe's parameters in its instance.

def sync( self, df: Union[pandas.core.frame.DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], str, meerschaum.core.Pipe._sync.InferFetch] = <class 'meerschaum.core.Pipe._sync.InferFetch'>, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, force: bool = False, retries: int = 10, min_seconds: int = 1, check_existing: bool = True, enforce_dtypes: bool = True, blocking: bool = True, workers: Optional[int] = None, callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, error_callback: Optional[Callable[[Exception], Any]] = None, chunksize: Optional[int] = -1, sync_chunks: bool = True, debug: bool = False, _inplace: bool = True, **kw: Any) -> Tuple[bool, str]:
 41def sync(
 42    self,
 43    df: Union[
 44        pd.DataFrame,
 45        Dict[str, List[Any]],
 46        List[Dict[str, Any]],
 47        str,
 48        InferFetch
 49    ] = InferFetch,
 50    begin: Union[datetime, int, str, None] = '',
 51    end: Union[datetime, int, None] = None,
 52    force: bool = False,
 53    retries: int = 10,
 54    min_seconds: int = 1,
 55    check_existing: bool = True,
 56    enforce_dtypes: bool = True,
 57    blocking: bool = True,
 58    workers: Optional[int] = None,
 59    callback: Optional[Callable[[Tuple[bool, str]], Any]] = None,
 60    error_callback: Optional[Callable[[Exception], Any]] = None,
 61    chunksize: Optional[int] = -1,
 62    sync_chunks: bool = True,
 63    debug: bool = False,
 64    _inplace: bool = True,
 65    **kw: Any
 66) -> SuccessTuple:
 67    """
 68    Fetch new data from the source and update the pipe's table with new data.
 69    
 70    Get new remote data via fetch, get existing data in the same time period,
 71    and merge the two, only keeping the unseen data.
 72
 73    Parameters
 74    ----------
 75    df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None
 76        An optional DataFrame to sync into the pipe. Defaults to `None`.
 77        If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`.
 78
 79    begin: Union[datetime, int, str, None], default ''
 80        Optionally specify the earliest datetime to search for data.
 81
 82    end: Union[datetime, int, str, None], default None
 83        Optionally specify the latest datetime to search for data.
 84
 85    force: bool, default False
 86        If `True`, keep trying to sync untul `retries` attempts.
 87
 88    retries: int, default 10
 89        If `force`, how many attempts to try syncing before declaring failure.
 90
 91    min_seconds: Union[int, float], default 1
 92        If `force`, how many seconds to sleep between retries. Defaults to `1`.
 93
 94    check_existing: bool, default True
 95        If `True`, pull and diff with existing data from the pipe.
 96
 97    enforce_dtypes: bool, default True
 98        If `True`, enforce dtypes on incoming data.
 99        Set this to `False` if the incoming rows are expected to be of the correct dtypes.
100
101    blocking: bool, default True
102        If `True`, wait for sync to finish and return its result, otherwise
103        asyncronously sync (oxymoron?) and return success. Defaults to `True`.
104        Only intended for specific scenarios.
105
106    workers: Optional[int], default None
107        If provided and the instance connector is thread-safe
108        (`pipe.instance_connector.IS_THREAD_SAFE is True`),
109        limit concurrent sync to this many threads.
110
111    callback: Optional[Callable[[Tuple[bool, str]], Any]], default None
112        Callback function which expects a SuccessTuple as input.
113        Only applies when `blocking=False`.
114
115    error_callback: Optional[Callable[[Exception], Any]], default None
116        Callback function which expects an Exception as input.
117        Only applies when `blocking=False`.
118
119    chunksize: int, default -1
120        Specify the number of rows to sync per chunk.
121        If `-1`, resort to system configuration (default is `900`).
122        A `chunksize` of `None` will sync all rows in one transaction.
123
124    sync_chunks: bool, default True
125        If possible, sync chunks while fetching them into memory.
126
127    debug: bool, default False
128        Verbosity toggle. Defaults to False.
129
130    Returns
131    -------
132    A `SuccessTuple` of success (`bool`) and message (`str`).
133    """
134    from meerschaum.utils.debug import dprint, _checkpoint
135    from meerschaum.utils.formatting import get_console
136    from meerschaum.utils.venv import Venv
137    from meerschaum.connectors import get_connector_plugin
138    from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments
139    from meerschaum.utils.pool import get_pool
140    from meerschaum.config import get_config
141    from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp
142
143    if (callback is not None or error_callback is not None) and blocking:
144        warn("Callback functions are only executed when blocking = False. Ignoring...")
145
146    _checkpoint(_total=2, **kw)
147
148    if chunksize == 0:
149        chunksize = None
150        sync_chunks = False
151
152    begin, end = self.parse_date_bounds(begin, end)
153    kw.update({
154        'begin': begin,
155        'end': end,
156        'force': force,
157        'retries': retries,
158        'min_seconds': min_seconds,
159        'check_existing': check_existing,
160        'blocking': blocking,
161        'workers': workers,
162        'callback': callback,
163        'error_callback': error_callback,
164        'sync_chunks': sync_chunks,
165        'chunksize': chunksize,
166        'safe_copy': True,
167    })
168
169    self._invalidate_cache(debug=debug)
170    self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug)
171
172    def _sync(
173        p: mrsm.Pipe,
174        df: Union[
175            'pd.DataFrame',
176            Dict[str, List[Any]],
177            List[Dict[str, Any]],
178            str,
179            InferFetch
180        ] = InferFetch,
181    ) -> SuccessTuple:
182        if df is None:
183            p._invalidate_cache(debug=debug)
184            return (
185                False,
186                f"You passed `None` instead of data into `sync()` for {p}.\n"
187                + "Omit the DataFrame to infer fetching.",
188            )
189        ### Ensure that Pipe is registered.
190        if not p.temporary and p.get_id(debug=debug) is None:
191            ### NOTE: This may trigger an interactive session for plugins!
192            register_success, register_msg = p.register(debug=debug)
193            if not register_success:
194                if 'already' not in register_msg:
195                    p._invalidate_cache(debug=debug)
196                    return register_success, register_msg
197
198        if isinstance(df, str):
199            from meerschaum.utils.dataframe import parse_simple_lines
200            df = parse_simple_lines(df)
201
202        ### If connector is a plugin with a `sync()` method, return that instead.
203        ### If the plugin does not have a `sync()` method but does have a `fetch()` method,
204        ### use that instead.
205        ### NOTE: The DataFrame must be omitted for the plugin sync method to apply.
206        ### If a DataFrame is provided, continue as expected.
207        if hasattr(df, 'MRSM_INFER_FETCH'):
208            try:
209                if p.connector is None:
210                    if ':' not in p.connector_keys:
211                        return True, f"{p} does not support fetching; nothing to do."
212
213                    msg = f"{p} does not have a valid connector."
214                    if p.connector_keys.startswith('plugin:'):
215                        msg += f"\n    Perhaps {p.connector_keys} has a syntax error?"
216                    p._invalidate_cache(debug=debug)
217                    return False, msg
218            except Exception:
219                p._invalidate_cache(debug=debug)
220                return False, f"Unable to create the connector for {p}."
221
222            ### Sync in place if possible.
223            if (
224                str(self.connector) == str(self.instance_connector)
225                and 
226                hasattr(self.instance_connector, 'sync_pipe_inplace')
227                and
228                _inplace
229                and
230                get_config('system', 'experimental', 'inplace_sync')
231            ):
232                with Venv(get_connector_plugin(self.instance_connector)):
233                    p._invalidate_cache(debug=debug)
234                    _args, _kwargs = filter_arguments(
235                        p.instance_connector.sync_pipe_inplace,
236                        p,
237                        debug=debug,
238                        **kw
239                    )
240                    return self.instance_connector.sync_pipe_inplace(
241                        *_args,
242                        **_kwargs
243                    )
244
245            ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods.
246            try:
247                if getattr(p.connector, 'sync', None) is not None:
248                    with Venv(get_connector_plugin(p.connector), debug=debug):
249                        _args, _kwargs = filter_arguments(
250                            p.connector.sync,
251                            p,
252                            debug=debug,
253                            **kw
254                        )
255                        return_tuple = p.connector.sync(*_args, **_kwargs)
256                    p._invalidate_cache(debug=debug)
257                    if not isinstance(return_tuple, tuple):
258                        return_tuple = (
259                            False,
260                            f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}"
261                        )
262                    return return_tuple
263
264            except Exception as e:
265                get_console().print_exception()
266                msg = f"Failed to sync {p} with exception: '" + str(e) + "'"
267                if debug:
268                    error(msg, silent=False)
269                p._invalidate_cache(debug=debug)
270                return False, msg
271
272            ### Fetch the dataframe from the connector's `fetch()` method.
273            try:
274                with Venv(get_connector_plugin(p.connector), debug=debug):
275                    df = p.fetch(
276                        **filter_keywords(
277                            p.fetch,
278                            debug=debug,
279                            **kw
280                        )
281                    )
282                    kw['safe_copy'] = False
283            except Exception as e:
284                get_console().print_exception(
285                    suppress=[
286                        'meerschaum/core/Pipe/_sync.py',
287                        'meerschaum/core/Pipe/_fetch.py',
288                    ]
289                )
290                msg = f"Failed to fetch data from {p.connector}:\n    {e}"
291                df = None
292
293            if df is None:
294                p._invalidate_cache(debug=debug)
295                return False, f"No data were fetched for {p}."
296
297            if isinstance(df, list):
298                if len(df) == 0:
299                    return True, f"No new rows were returned for {p}."
300
301                ### May be a chunk hook results list.
302                if isinstance(df[0], tuple):
303                    success = all([_success for _success, _ in df])
304                    message = '\n'.join([_message for _, _message in df])
305                    return success, message
306
307            if df is True:
308                p._invalidate_cache(debug=debug)
309                return True, f"{p} is being synced in parallel."
310
311        ### CHECKPOINT: Retrieved the DataFrame.
312        _checkpoint(**kw)
313
314        ### Allow for dataframe generators or iterables.
315        if df_is_chunk_generator(df):
316            kw['workers'] = p.get_num_workers(kw.get('workers', None))
317            dt_col = p.columns.get('datetime', None)
318            pool = get_pool(workers=kw.get('workers', 1))
319            if debug:
320                dprint(f"Received {type(df)}. Attempting to sync first chunk...")
321
322            try:
323                chunk = next(df)
324            except StopIteration:
325                return True, "Received an empty generator; nothing to do."
326
327            chunk_success, chunk_msg = _sync(p, chunk)
328            chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg
329            if not chunk_success:
330                return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}"
331            if debug:
332                dprint("Successfully synced the first chunk, attemping the rest...")
333
334            def _process_chunk(_chunk):
335                _chunk_attempts = 0
336                _max_chunk_attempts = 3
337                while _chunk_attempts < _max_chunk_attempts:
338                    try:
339                        _chunk_success, _chunk_msg = _sync(p, _chunk)
340                    except Exception as e:
341                        _chunk_success, _chunk_msg = False, str(e)
342                    if _chunk_success:
343                        break
344                    _chunk_attempts += 1
345                    _sleep_seconds = _chunk_attempts ** 2
346                    warn(
347                        (
348                            f"Failed to sync chunk to {self} "
349                            + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n"
350                            + f"Sleeping for {_sleep_seconds} second"
351                            + ('s' if _sleep_seconds != 1 else '')
352                            + f":\n{_chunk_msg}"
353                        ),
354                        stack=False,
355                    )
356                    time.sleep(_sleep_seconds)
357
358                num_rows_str = (
359                    f"{num_rows:,} rows"
360                    if (num_rows := len(_chunk)) != 1
361                    else f"{num_rows} row"
362                )
363                _chunk_msg = (
364                    (
365                        "Synced"
366                        if _chunk_success
367                        else "Failed to sync"
368                    ) + f" a chunk ({num_rows_str}) to {p}:\n"
369                    + self._get_chunk_label(_chunk, dt_col)
370                    + '\n'
371                    + _chunk_msg
372                )
373
374                mrsm.pprint((_chunk_success, _chunk_msg), calm=True)
375                return _chunk_success, _chunk_msg
376
377            results = sorted(
378                [(chunk_success, chunk_msg)] + (
379                    list(pool.imap(_process_chunk, df))
380                    if (
381                        not df_is_chunk_generator(chunk)  # Handle nested generators.
382                        and kw.get('workers', 1) != 1
383                    )
384                    else list(
385                        _process_chunk(_child_chunks)
386                        for _child_chunks in df
387                    )
388                )
389            )
390            chunk_messages = [chunk_msg for _, chunk_msg in results]
391            success_bools = [chunk_success for chunk_success, _ in results]
392            num_successes = len([chunk_success for chunk_success, _ in results if chunk_success])
393            num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success])
394            success = all(success_bools)
395            msg = (
396                'Synced '
397                + f'{len(chunk_messages):,} chunk'
398                + ('s' if len(chunk_messages) != 1 else '')
399                + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n'
400                + '\n\n'.join(chunk_messages).lstrip().rstrip()
401            ).lstrip().rstrip()
402            return success, msg
403
404        ### Cast to a dataframe and ensure datatypes are what we expect.
405        dtypes = p.get_dtypes(debug=debug)
406        df = p.enforce_dtypes(
407            df,
408            chunksize=chunksize,
409            enforce=enforce_dtypes,
410            dtypes=dtypes,
411            debug=debug,
412        )
413        if p.autotime:
414            dt_col = p.columns.get('datetime', None)
415            ts_col = dt_col or mrsm.get_config(
416                'pipes', 'autotime', 'column_name_if_datetime_missing'
417            )
418            ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime'
419            if ts_col and hasattr(df, 'columns') and ts_col not in df.columns:
420                precision = p.get_precision(debug=debug)
421                now = get_current_timestamp(
422                    precision_unit=precision.get(
423                        'unit',
424                        STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
425                    ),
426                    precision_interval=precision.get('interval', 1),
427                    round_to=(precision.get('round_to', 'down')),
428                    as_int=(are_dtypes_equal(ts_typ, 'int')),
429                )
430                if debug:
431                    dprint(f"Adding current timestamp to dataframe synced to {p}: {now}")
432
433                df[ts_col] = now
434                kw['check_existing'] = dt_col is not None
435
436        ### Capture special columns.
437        capture_success, capture_msg = self._persist_new_special_columns(
438            df,
439            dtypes=dtypes,
440            debug=debug,
441        )
442        if not capture_success:
443            warn(f"Failed to capture new special columns for {self}:\n{capture_msg}")
444
445        if debug:
446            dprint(
447                "DataFrame to sync:\n"
448                + (
449                    str(df)[:255]
450                    + '...'
451                    if len(str(df)) >= 256
452                    else str(df)
453                ),
454                **kw
455            )
456
457        ### if force, continue to sync until success
458        return_tuple = False, f"Did not sync {p}."
459        run = True
460        _retries = 1
461        while run:
462            with Venv(get_connector_plugin(self.instance_connector)):
463                return_tuple = p.instance_connector.sync_pipe(
464                    pipe=p,
465                    df=df,
466                    debug=debug,
467                    **kw
468                )
469            _retries += 1
470            run = (not return_tuple[0]) and force and _retries <= retries
471            if run and debug:
472                dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw)
473                dprint(f"Sleeping for {min_seconds} seconds...", **kw)
474                time.sleep(min_seconds)
475            if _retries > retries:
476                warn(
477                    f"Unable to sync {p} within {retries} attempt" +
478                        ("s" if retries != 1 else "") + "!"
479                )
480
481        ### CHECKPOINT: Finished syncing.
482        _checkpoint(**kw)
483        p._invalidate_cache(debug=debug)
484        return return_tuple
485
486    if blocking:
487        return _sync(self, df=df)
488
489    from meerschaum.utils.threading import Thread
490    def default_callback(result_tuple: SuccessTuple):
491        dprint(f"Asynchronous result from {self}: {result_tuple}", **kw)
492
493    def default_error_callback(x: Exception):
494        dprint(f"Error received for {self}: {x}", **kw)
495
496    if callback is None and debug:
497        callback = default_callback
498    if error_callback is None and debug:
499        error_callback = default_error_callback
500    try:
501        thread = Thread(
502            target=_sync,
503            args=(self,),
504            kwargs={'df': df},
505            daemon=False,
506            callback=callback,
507            error_callback=error_callback,
508        )
509        thread.start()
510    except Exception as e:
511        self._invalidate_cache(debug=debug)
512        return False, str(e)
513
514    self._invalidate_cache(debug=debug)
515    return True, f"Spawned asyncronous sync for {self}."

Fetch new data from the source and update the pipe's table with new data.

Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.

Parameters
  • df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None): An optional DataFrame to sync into the pipe. Defaults to None. If df is a string, it will be parsed via meerschaum.utils.dataframe.parse_simple_lines().
  • begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
  • end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
  • force (bool, default False): If True, keep trying to sync untul retries attempts.
  • retries (int, default 10): If force, how many attempts to try syncing before declaring failure.
  • min_seconds (Union[int, float], default 1): If force, how many seconds to sleep between retries. Defaults to 1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe.
  • enforce_dtypes (bool, default True): If True, enforce dtypes on incoming data. Set this to False if the incoming rows are expected to be of the correct dtypes.
  • blocking (bool, default True): If True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults to True. Only intended for specific scenarios.
  • workers (Optional[int], default None): If provided and the instance connector is thread-safe (pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads.
  • callback (Optional[Callable[[Tuple[bool, str]], Any]], default None): Callback function which expects a SuccessTuple as input. Only applies when blocking=False.
  • error_callback (Optional[Callable[[Exception], Any]], default None): Callback function which expects an Exception as input. Only applies when blocking=False.
  • chunksize (int, default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction.
  • sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def get_sync_time( self, params: Optional[Dict[str, Any]] = None, newest: bool = True, apply_backtrack_interval: bool = False, remote: bool = False, round_down: bool = False, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
518def get_sync_time(
519    self,
520    params: Optional[Dict[str, Any]] = None,
521    newest: bool = True,
522    apply_backtrack_interval: bool = False,
523    remote: bool = False,
524    round_down: bool = False,
525    debug: bool = False
526) -> Union['datetime', int, None]:
527    """
528    Get the most recent datetime value for a Pipe.
529
530    Parameters
531    ----------
532    params: Optional[Dict[str, Any]], default None
533        Dictionary to build a WHERE clause for a specific column.
534        See `meerschaum.utils.sql.build_where`.
535
536    newest: bool, default True
537        If `True`, get the most recent datetime (honoring `params`).
538        If `False`, get the oldest datetime (`ASC` instead of `DESC`).
539
540    apply_backtrack_interval: bool, default False
541        If `True`, subtract the backtrack interval from the sync time.
542
543    remote: bool, default False
544        If `True` and the instance connector supports it, return the sync time
545        for the remote table definition.
546
547    round_down: bool, default False
548        If `True`, round down the datetime value to the nearest minute.
549
550    debug: bool, default False
551        Verbosity toggle.
552
553    Returns
554    -------
555    A `datetime` or int, if the pipe exists, otherwise `None`.
556
557    """
558    from meerschaum.utils.venv import Venv
559    from meerschaum.connectors import get_connector_plugin
560    from meerschaum.utils.misc import filter_keywords
561    from meerschaum.utils.dtypes import round_time
562    from meerschaum.utils.warnings import warn
563
564    if not self.columns.get('datetime', None):
565        return None
566
567    connector = self.instance_connector if not remote else self.connector
568    with Venv(get_connector_plugin(connector)):
569        if not hasattr(connector, 'get_sync_time'):
570            warn(
571                f"Connectors of type '{connector.type}' "
572                "do not implement `get_sync_time().",
573                stack=False,
574            )
575            return None
576        sync_time = connector.get_sync_time(
577            self,
578            **filter_keywords(
579                connector.get_sync_time,
580                params=params,
581                newest=newest,
582                remote=remote,
583                debug=debug,
584            )
585        )
586
587    if round_down and isinstance(sync_time, datetime):
588        sync_time = round_time(sync_time, timedelta(minutes=1))
589
590    if apply_backtrack_interval and sync_time is not None:
591        backtrack_interval = self.get_backtrack_interval(debug=debug)
592        try:
593            sync_time -= backtrack_interval
594        except Exception as e:
595            warn(f"Failed to apply backtrack interval:\n{e}")
596
597    return self.parse_date_bounds(sync_time)

Get the most recent datetime value for a Pipe.

Parameters
  • params (Optional[Dict[str, Any]], default None): Dictionary to build a WHERE clause for a specific column. See meerschaum.utils.sql.build_where.
  • newest (bool, default True): If True, get the most recent datetime (honoring params). If False, get the oldest datetime (ASC instead of DESC).
  • apply_backtrack_interval (bool, default False): If True, subtract the backtrack interval from the sync time.
  • remote (bool, default False): If True and the instance connector supports it, return the sync time for the remote table definition.
  • round_down (bool, default False): If True, round down the datetime value to the nearest minute.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A datetime or int, if the pipe exists, otherwise None.
def exists(self, debug: bool = False) -> bool:
600def exists(
601    self,
602    debug: bool = False
603) -> bool:
604    """
605    See if a Pipe's table exists.
606
607    Parameters
608    ----------
609    debug: bool, default False
610        Verbosity toggle.
611
612    Returns
613    -------
614    A `bool` corresponding to whether a pipe's underlying table exists.
615
616    """
617    from meerschaum.utils.venv import Venv
618    from meerschaum.connectors import get_connector_plugin
619    from meerschaum.utils.debug import dprint
620    from meerschaum.utils.dtypes import get_current_timestamp
621    now = get_current_timestamp('ms', as_int=True) / 1000
622    cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds')
623
624    _exists = self._get_cached_value('_exists', debug=debug)
625    if _exists:
626        exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug)
627        if exists_timestamp is not None:
628            delta = now - exists_timestamp
629            if delta < cache_seconds:
630                if debug:
631                    dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).")
632                return _exists
633
634    with Venv(get_connector_plugin(self.instance_connector)):
635        _exists = (
636            self.instance_connector.pipe_exists(pipe=self, debug=debug)
637            if hasattr(self.instance_connector, 'pipe_exists')
638            else False
639        )
640
641    self._cache_value('_exists', _exists, debug=debug)
642    self._cache_value('_exists_timestamp', now, debug=debug)
643    return _exists

See if a Pipe's table exists.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool corresponding to whether a pipe's underlying table exists.
def filter_existing( self, df: pandas.core.frame.DataFrame, safe_copy: bool = True, date_bound_only: bool = False, include_unchanged_columns: bool = False, enforce_dtypes: bool = False, chunksize: Optional[int] = -1, debug: bool = False, **kw) -> Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]:
646def filter_existing(
647    self,
648    df: 'pd.DataFrame',
649    safe_copy: bool = True,
650    date_bound_only: bool = False,
651    include_unchanged_columns: bool = False,
652    enforce_dtypes: bool = False,
653    chunksize: Optional[int] = -1,
654    debug: bool = False,
655    **kw
656) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']:
657    """
658    Inspect a dataframe and filter out rows which already exist in the pipe.
659
660    Parameters
661    ----------
662    df: 'pd.DataFrame'
663        The dataframe to inspect and filter.
664
665    safe_copy: bool, default True
666        If `True`, create a copy before comparing and modifying the dataframes.
667        Setting to `False` may mutate the DataFrames.
668        See `meerschaum.utils.dataframe.filter_unseen_df`.
669
670    date_bound_only: bool, default False
671        If `True`, only use the datetime index to fetch the sample dataframe.
672
673    include_unchanged_columns: bool, default False
674        If `True`, include the backtrack columns which haven't changed in the update dataframe.
675        This is useful if you can't update individual keys.
676
677    enforce_dtypes: bool, default False
678        If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes.
679        Setting `enforce_dtypes=True` may impact performance.
680
681    chunksize: Optional[int], default -1
682        The `chunksize` used when fetching existing data.
683
684    debug: bool, default False
685        Verbosity toggle.
686
687    Returns
688    -------
689    A tuple of three pandas DataFrames: unseen, update, and delta.
690    """
691    from meerschaum.utils.warnings import warn
692    from meerschaum.utils.debug import dprint
693    from meerschaum.utils.packages import attempt_import, import_pandas
694    from meerschaum.utils.dataframe import (
695        filter_unseen_df,
696        add_missing_cols_to_df,
697        get_unhashable_cols,
698    )
699    from meerschaum.utils.dtypes import (
700        to_pandas_dtype,
701        none_if_null,
702        to_datetime,
703        are_dtypes_equal,
704        value_is_null,
705        round_time,
706    )
707    from meerschaum.config import get_config
708    pd = import_pandas()
709    pandas = attempt_import('pandas')
710    if enforce_dtypes or 'dataframe' not in str(type(df)).lower():
711        df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
712    is_dask = hasattr('df', '__module__') and 'dask' in df.__module__
713    if is_dask:
714        dd = attempt_import('dask.dataframe')
715        merge = dd.merge
716        NA = pandas.NA
717    else:
718        merge = pd.merge
719        NA = pd.NA
720
721    parameters = self.parameters
722    pipe_columns = parameters.get('columns', {})
723    primary_key = pipe_columns.get('primary', None)
724    dt_col = pipe_columns.get('datetime', None)
725    dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None
726    autoincrement = parameters.get('autoincrement', False)
727    autotime = parameters.get('autotime', False)
728
729    if primary_key and autoincrement and df is not None and primary_key in df.columns:
730        if safe_copy:
731            df = df.copy()
732            safe_copy = False
733        if df[primary_key].isnull().all():
734            del df[primary_key]
735            _ = self.columns.pop(primary_key, None)
736
737    if dt_col and autotime and df is not None and dt_col in df.columns:
738        if safe_copy:
739            df = df.copy()
740            safe_copy = False
741        if df[dt_col].isnull().all():
742            del df[dt_col]
743            _ = self.columns.pop(dt_col, None)
744
745    def get_empty_df():
746        empty_df = pd.DataFrame([])
747        dtypes = dict(df.dtypes) if df is not None else {}
748        dtypes.update(self.dtypes) if self.enforce else {}
749        pd_dtypes = {
750            col: to_pandas_dtype(str(typ))
751            for col, typ in dtypes.items()
752        }
753        return add_missing_cols_to_df(empty_df, pd_dtypes)
754
755    if df is None:
756        empty_df = get_empty_df()
757        return empty_df, empty_df, empty_df
758
759    if (df.empty if not is_dask else len(df) == 0):
760        return df, df, df
761
762    ### begin is the oldest data in the new dataframe
763    begin, end = None, None
764
765    if autoincrement and primary_key == dt_col and dt_col not in df.columns:
766        if enforce_dtypes:
767            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
768        return df, get_empty_df(), df
769
770    if autotime and dt_col and dt_col not in df.columns:
771        if enforce_dtypes:
772            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
773        return df, get_empty_df(), df
774
775    try:
776        min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None
777        if is_dask and min_dt_val is not None:
778            min_dt_val = min_dt_val.compute()
779        min_dt = (
780            to_datetime(min_dt_val, as_pydatetime=True)
781            if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime')
782            else min_dt_val
783        )
784    except Exception:
785        min_dt = None
786
787    if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt):
788        if not are_dtypes_equal('int', str(type(min_dt))):
789            min_dt = None
790
791    if isinstance(min_dt, datetime):
792        rounded_min_dt = round_time(min_dt, to='down')
793        try:
794            begin = rounded_min_dt - timedelta(minutes=1)
795        except OverflowError:
796            begin = rounded_min_dt
797    elif dt_type and 'int' in dt_type.lower():
798        begin = min_dt
799    elif dt_col is None:
800        begin = None
801
802    ### end is the newest data in the new dataframe
803    try:
804        max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None
805        if is_dask and max_dt_val is not None:
806            max_dt_val = max_dt_val.compute()
807        max_dt = (
808            to_datetime(max_dt_val, as_pydatetime=True)
809            if max_dt_val is not None and 'datetime' in str(dt_type)
810            else max_dt_val
811        )
812    except Exception:
813        import traceback
814        traceback.print_exc()
815        max_dt = None
816
817    if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt):
818        if not are_dtypes_equal('int', str(type(max_dt))):
819            max_dt = None
820
821    if isinstance(max_dt, datetime):
822        end = (
823            round_time(
824                max_dt,
825                to='down'
826            ) + timedelta(minutes=1)
827        )
828    elif dt_type and 'int' in dt_type.lower() and max_dt is not None:
829        end = max_dt + 1
830
831    if max_dt is not None and min_dt is not None and min_dt > max_dt:
832        warn("Detected minimum datetime greater than maximum datetime.")
833
834    if begin is not None and end is not None and begin > end:
835        if isinstance(begin, datetime):
836            begin = end - timedelta(minutes=1)
837        ### We might be using integers for the datetime axis.
838        else:
839            begin = end - 1
840
841    unique_index_vals = {
842        col: df[col].unique()
843        for col in (pipe_columns if not primary_key else [primary_key])
844        if col in df.columns and col != dt_col
845    } if not date_bound_only else {}
846    filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit')
847    _ = kw.pop('params', None)
848    params = {
849        col: [
850            none_if_null(val)
851            for val in unique_vals
852        ]
853        for col, unique_vals in unique_index_vals.items()
854        if len(unique_vals) <= filter_params_index_limit
855    } if not date_bound_only else {}
856
857    if debug:
858        dprint(f"Looking at data between '{begin}' and '{end}':", **kw)
859
860    backtrack_df = self.get_data(
861        begin=begin,
862        end=end,
863        chunksize=chunksize,
864        params=params,
865        debug=debug,
866        **kw
867    )
868    if backtrack_df is None:
869        if debug:
870            dprint(f"No backtrack data was found for {self}.")
871        return df, get_empty_df(), df
872
873    if enforce_dtypes:
874        backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug)
875
876    if debug:
877        dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw)
878        dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes))
879
880    ### Separate new rows from changed ones.
881    on_cols = [
882        col
883        for col_key, col in pipe_columns.items()
884        if (
885            col
886            and
887            col_key != 'value'
888            and col in backtrack_df.columns
889        )
890    ] if not primary_key else [primary_key]
891
892    self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {}
893    on_cols_dtypes = {
894        col: to_pandas_dtype(typ)
895        for col, typ in self_dtypes.items()
896        if col in on_cols
897    }
898
899    ### Detect changes between the old target and new source dataframes.
900    delta_df = add_missing_cols_to_df(
901        filter_unseen_df(
902            backtrack_df,
903            df,
904            dtypes={
905                col: to_pandas_dtype(typ)
906                for col, typ in self_dtypes.items()
907            },
908            safe_copy=safe_copy,
909            coerce_mixed_numerics=(not self.static),
910            debug=debug
911        ),
912        on_cols_dtypes,
913    )
914    if enforce_dtypes:
915        delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug)
916
917    ### Cast dicts or lists to strings so we can merge.
918    serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str)
919
920    def deserializer(x):
921        return json.loads(x) if isinstance(x, str) else x
922
923    unhashable_delta_cols = get_unhashable_cols(delta_df)
924    unhashable_backtrack_cols = get_unhashable_cols(backtrack_df)
925    for col in unhashable_delta_cols:
926        delta_df[col] = delta_df[col].apply(serializer)
927    for col in unhashable_backtrack_cols:
928        backtrack_df[col] = backtrack_df[col].apply(serializer)
929    casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols)
930
931    joined_df = merge(
932        delta_df.infer_objects(copy=False).fillna(NA),
933        backtrack_df.infer_objects(copy=False).fillna(NA),
934        how='left',
935        on=on_cols,
936        indicator=True,
937        suffixes=('', '_old'),
938    ) if on_cols else delta_df
939    for col in casted_cols:
940        if col in joined_df.columns:
941            joined_df[col] = joined_df[col].apply(deserializer)
942        if col in delta_df.columns:
943            delta_df[col] = delta_df[col].apply(deserializer)
944
945    ### Determine which rows are completely new.
946    new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None
947    cols = list(delta_df.columns)
948
949    unseen_df = (
950        joined_df
951        .where(new_rows_mask)
952        .dropna(how='all')[cols]
953        .reset_index(drop=True)
954    ) if on_cols else delta_df
955
956    ### Rows that have already been inserted but values have changed.
957    update_df = (
958        joined_df
959        .where(~new_rows_mask)
960        .dropna(how='all')[cols]
961        .reset_index(drop=True)
962    ) if on_cols else get_empty_df()
963
964    if include_unchanged_columns and on_cols:
965        unchanged_backtrack_cols = [
966            col
967            for col in backtrack_df.columns
968            if col in on_cols or col not in update_df.columns
969        ]
970        if enforce_dtypes:
971            update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug)
972        update_df = merge(
973            backtrack_df[unchanged_backtrack_cols],
974            update_df,
975            how='inner',
976            on=on_cols,
977        )
978
979    return unseen_df, update_df, delta_df

Inspect a dataframe and filter out rows which already exist in the pipe.

Parameters
  • df ('pd.DataFrame'): The dataframe to inspect and filter.
  • safe_copy (bool, default True): If True, create a copy before comparing and modifying the dataframes. Setting to False may mutate the DataFrames. See meerschaum.utils.dataframe.filter_unseen_df.
  • date_bound_only (bool, default False): If True, only use the datetime index to fetch the sample dataframe.
  • include_unchanged_columns (bool, default False): If True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys.
  • enforce_dtypes (bool, default False): If True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Setting enforce_dtypes=True may impact performance.
  • chunksize (Optional[int], default -1): The chunksize used when fetching existing data.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A tuple of three pandas DataFrames (unseen, update, and delta.):
def get_num_workers(self, workers: Optional[int] = None) -> int:
1004def get_num_workers(self, workers: Optional[int] = None) -> int:
1005    """
1006    Get the number of workers to use for concurrent syncs.
1007
1008    Parameters
1009    ----------
1010    The number of workers passed via `--workers`.
1011
1012    Returns
1013    -------
1014    The number of workers, capped for safety.
1015    """
1016    is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False)
1017    if not is_thread_safe:
1018        return 1
1019
1020    engine_pool_size = (
1021        self.instance_connector.engine.pool.size()
1022        if self.instance_connector.type == 'sql'
1023        else None
1024    )
1025    current_num_threads = threading.active_count()
1026    current_num_connections = (
1027        self.instance_connector.engine.pool.checkedout()
1028        if engine_pool_size is not None
1029        else current_num_threads
1030    )
1031    desired_workers = (
1032        min(workers or engine_pool_size, engine_pool_size)
1033        if engine_pool_size is not None
1034        else workers
1035    )
1036    if desired_workers is None:
1037        desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1)
1038
1039    return max(
1040        (desired_workers - current_num_connections),
1041        1,
1042    )

Get the number of workers to use for concurrent syncs.

Parameters
  • The number of workers passed via --workers.
Returns
  • The number of workers, capped for safety.
def verify( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, bounded: Optional[bool] = None, deduplicate: bool = False, workers: Optional[int] = None, batchsize: Optional[int] = None, skip_chunks_with_greater_rowcounts: bool = False, check_rowcounts_only: bool = False, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 19def verify(
 20    self,
 21    begin: Union[datetime, int, None] = None,
 22    end: Union[datetime, int, None] = None,
 23    params: Optional[Dict[str, Any]] = None,
 24    chunk_interval: Union[timedelta, int, None] = None,
 25    bounded: Optional[bool] = None,
 26    deduplicate: bool = False,
 27    workers: Optional[int] = None,
 28    batchsize: Optional[int] = None,
 29    skip_chunks_with_greater_rowcounts: bool = False,
 30    check_rowcounts_only: bool = False,
 31    debug: bool = False,
 32    **kwargs: Any
 33) -> SuccessTuple:
 34    """
 35    Verify the contents of the pipe by resyncing its interval.
 36
 37    Parameters
 38    ----------
 39    begin: Union[datetime, int, None], default None
 40        If specified, only verify rows greater than or equal to this value.
 41
 42    end: Union[datetime, int, None], default None
 43        If specified, only verify rows less than this value.
 44
 45    chunk_interval: Union[timedelta, int, None], default None
 46        If provided, use this as the size of the chunk boundaries.
 47        Default to the value set in `pipe.parameters['chunk_minutes']` (1440).
 48
 49    bounded: Optional[bool], default None
 50        If `True`, do not verify older than the oldest sync time or newer than the newest.
 51        If `False`, verify unbounded syncs outside of the new and old sync times.
 52        The default behavior (`None`) is to bound only if a bound interval is set
 53        (e.g. `pipe.parameters['verify']['bound_days']`).
 54
 55    deduplicate: bool, default False
 56        If `True`, deduplicate the pipe's table after the verification syncs.
 57
 58    workers: Optional[int], default None
 59        If provided, limit the verification to this many threads.
 60        Use a value of `1` to sync chunks in series.
 61
 62    batchsize: Optional[int], default None
 63        If provided, sync this many chunks in parallel.
 64        Defaults to `Pipe.get_num_workers()`.
 65
 66    skip_chunks_with_greater_rowcounts: bool, default False
 67        If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's
 68        chunk rowcount equals or exceeds the remote's rowcount.
 69
 70    check_rowcounts_only: bool, default False
 71        If `True`, only compare rowcounts and print chunks which are out-of-sync.
 72
 73    debug: bool, default False
 74        Verbosity toggle.
 75
 76    kwargs: Any
 77        All keyword arguments are passed to `pipe.sync()`.
 78
 79    Returns
 80    -------
 81    A SuccessTuple indicating whether the pipe was successfully resynced.
 82    """
 83    from meerschaum.utils.pool import get_pool
 84    from meerschaum.utils.formatting import make_header
 85    from meerschaum.utils.misc import interval_str
 86    workers = self.get_num_workers(workers)
 87    check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only
 88
 89    ### Skip configured bounding in parameters
 90    ### if `bounded` is explicitly `False`.
 91    bound_time = (
 92        self.get_bound_time(debug=debug)
 93        if bounded is not False
 94        else None
 95    )
 96    if bounded is None:
 97        bounded = bound_time is not None
 98
 99    if bounded and begin is None:
100        begin = (
101            bound_time
102            if bound_time is not None
103            else self.get_sync_time(newest=False, debug=debug)
104        )
105        if begin is None:
106            remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug)
107            begin = remote_oldest_sync_time
108    if bounded and end is None:
109        end = self.get_sync_time(newest=True, debug=debug)
110        if end is None:
111            remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug)
112            end = remote_newest_sync_time
113        if end is not None:
114            end += (
115                timedelta(minutes=1)
116                if hasattr(end, 'tzinfo')
117                else 1
118            )
119
120    begin, end = self.parse_date_bounds(begin, end)
121    cannot_determine_bounds = bounded and begin is None and end is None
122
123    if cannot_determine_bounds and not check_rowcounts_only:
124        warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False)
125        sync_success, sync_msg = self.sync(
126            begin=begin,
127            end=end,
128            params=params,
129            workers=workers,
130            debug=debug,
131            **kwargs
132        )
133        if not sync_success:
134            return sync_success, sync_msg
135
136        if deduplicate:
137            return self.deduplicate(
138                begin=begin,
139                end=end,
140                params=params,
141                workers=workers,
142                debug=debug,
143                **kwargs
144            )
145        return sync_success, sync_msg
146
147    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
148    chunk_bounds = self.get_chunk_bounds(
149        begin=begin,
150        end=end,
151        chunk_interval=chunk_interval,
152        bounded=bounded,
153        debug=debug,
154    )
155
156    ### Consider it a success if no chunks need to be verified.
157    if not chunk_bounds:
158        if deduplicate:
159            return self.deduplicate(
160                begin=begin,
161                end=end,
162                params=params,
163                workers=workers,
164                debug=debug,
165                **kwargs
166            )
167        return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do."
168
169    begin_to_print = (
170        begin
171        if begin is not None
172        else (
173            chunk_bounds[0][0]
174            if bounded
175            else chunk_bounds[0][1]
176        )
177    )
178    end_to_print = (
179        end
180        if end is not None
181        else (
182            chunk_bounds[-1][1]
183            if bounded
184            else chunk_bounds[-1][0]
185        )
186    )
187    message_header = f"{begin_to_print} - {end_to_print}"
188    max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs')
189
190    info(
191        f"Verifying {self}:\n    "
192        + ("Syncing" if not check_rowcounts_only else "Checking")
193        + f" {len(chunk_bounds)} chunk"
194        + ('s' if len(chunk_bounds) != 1 else '')
195        + f" ({'un' if not bounded else ''}bounded)"
196        + f" of size '{interval_str(chunk_interval)}'"
197        + f" between '{begin_to_print}' and '{end_to_print}'.\n"
198    )
199
200    ### Dictionary of the form bounds -> success_tuple, e.g.:
201    ### {
202    ###    (2023-01-01, 2023-01-02): (True, "Success")
203    ### }
204    bounds_success_tuples = {}
205    def process_chunk_bounds(
206        chunk_begin_and_end: Tuple[
207            Union[int, datetime],
208            Union[int, datetime]
209        ],
210        _workers: Optional[int] = 1,
211    ):
212        if chunk_begin_and_end in bounds_success_tuples:
213            return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end]
214
215        chunk_begin, chunk_end = chunk_begin_and_end
216        do_sync = True
217        chunk_success, chunk_msg = False, "Did not sync chunk."
218        if check_rowcounts:
219            existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug)
220            remote_rowcount = self.get_rowcount(
221                begin=chunk_begin,
222                end=chunk_end,
223                remote=True,
224                debug=debug,
225            )
226            checked_rows_str = (
227                f"checked {existing_rowcount:,} row"
228                + ("s" if existing_rowcount != 1 else '')
229                + f" vs {remote_rowcount:,} remote"
230            )
231            if (
232                existing_rowcount is not None
233                and remote_rowcount is not None
234                and existing_rowcount >= remote_rowcount
235            ):
236                do_sync = False
237                chunk_success, chunk_msg = True, (
238                    "Row-count is up-to-date "
239                    f"({checked_rows_str})."
240                )
241            elif check_rowcounts_only:
242                do_sync = False
243                chunk_success, chunk_msg = True, (
244                    f"Row-counts are out-of-sync ({checked_rows_str})."
245                )
246
247        num_syncs = 0
248        while num_syncs < max_chunks_syncs:
249            chunk_success, chunk_msg = self.sync(
250                begin=chunk_begin,
251                end=chunk_end,
252                params=params,
253                workers=_workers,
254                debug=debug,
255                **kwargs
256            ) if do_sync else (chunk_success, chunk_msg)
257            if chunk_success:
258                break
259            num_syncs += 1
260            time.sleep(num_syncs**2)
261        chunk_msg = chunk_msg.strip()
262        if ' - ' not in chunk_msg:
263            chunk_label = f"{chunk_begin} - {chunk_end}"
264            chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}'
265        mrsm.pprint((chunk_success, chunk_msg))
266
267        return chunk_begin_and_end, (chunk_success, chunk_msg)
268
269    ### If we have more than one chunk, attempt to sync the first one and return if its fails.
270    if len(chunk_bounds) > 1:
271        first_chunk_bounds = chunk_bounds[0]
272        first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}"
273        info(f"Verifying first chunk for {self}:\n    {first_label}")
274        (
275            (first_begin, first_end),
276            (first_success, first_msg)
277        ) = process_chunk_bounds(first_chunk_bounds, _workers=workers)
278        if not first_success:
279            return (
280                first_success,
281                f"\n{first_label}\n"
282                + f"Failed to sync first chunk:\n{first_msg}"
283            )
284        bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg)
285        info(f"Completed first chunk for {self}:\n    {first_label}\n")
286        chunk_bounds = chunk_bounds[1:]
287
288    pool = get_pool(workers=workers)
289    batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers)
290
291    def process_batch(
292        batch_chunk_bounds: Tuple[
293            Tuple[Union[datetime, int, None], Union[datetime, int, None]],
294            ...
295        ]
296    ):
297        _batch_begin = batch_chunk_bounds[0][0]
298        _batch_end = batch_chunk_bounds[-1][-1]
299        batch_message_header = f"{_batch_begin} - {_batch_end}"
300
301        if check_rowcounts_only:
302            info(f"Checking row-counts for batch bounds:\n    {batch_message_header}")
303            _, (batch_init_success, batch_init_msg) = process_chunk_bounds(
304                (_batch_begin, _batch_end)
305            )
306            mrsm.pprint((batch_init_success, batch_init_msg))
307            if batch_init_success and 'up-to-date' in batch_init_msg:
308                info("Entire batch is up-to-date.")
309                return batch_init_success, batch_init_msg
310
311        batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds))
312        bounds_success_tuples.update(batch_bounds_success_tuples)
313        batch_bounds_success_bools = {
314            bounds: tup[0]
315            for bounds, tup in batch_bounds_success_tuples.items()
316        }
317
318        if all(batch_bounds_success_bools.values()):
319            msg = get_chunks_success_message(
320                batch_bounds_success_tuples,
321                header=batch_message_header,
322                check_rowcounts_only=check_rowcounts_only,
323            )
324            if deduplicate:
325                deduplicate_success, deduplicate_msg = self.deduplicate(
326                    begin=_batch_begin,
327                    end=_batch_end,
328                    params=params,
329                    workers=workers,
330                    debug=debug,
331                    **kwargs
332                )
333                return deduplicate_success, msg + '\n\n' + deduplicate_msg
334            return True, msg
335
336        batch_chunk_bounds_to_resync = [
337            bounds
338            for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools)
339            if not success
340        ]
341        batch_bounds_to_print = [
342            f"{bounds[0]} - {bounds[1]}"
343            for bounds in batch_chunk_bounds_to_resync
344        ]
345        if batch_bounds_to_print:
346            warn(
347                "Will resync the following failed chunks:\n    "
348                + '\n    '.join(batch_bounds_to_print),
349                stack=False,
350            )
351
352        retry_bounds_success_tuples = dict(pool.map(
353            process_chunk_bounds,
354            batch_chunk_bounds_to_resync
355        ))
356        batch_bounds_success_tuples.update(retry_bounds_success_tuples)
357        bounds_success_tuples.update(retry_bounds_success_tuples)
358        retry_bounds_success_bools = {
359            bounds: tup[0]
360            for bounds, tup in retry_bounds_success_tuples.items()
361        }
362
363        if all(retry_bounds_success_bools.values()):
364            chunks_message = (
365                get_chunks_success_message(
366                    batch_bounds_success_tuples,
367                    header=batch_message_header,
368                    check_rowcounts_only=check_rowcounts_only,
369                ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + (
370                    's'
371                    if len(batch_chunk_bounds_to_resync) != 1
372                    else ''
373                ) + "."
374            )
375            if deduplicate:
376                deduplicate_success, deduplicate_msg = self.deduplicate(
377                    begin=_batch_begin,
378                    end=_batch_end,
379                    params=params,
380                    workers=workers,
381                    debug=debug,
382                    **kwargs
383                )
384                return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg
385            return True, chunks_message
386
387        batch_chunks_message = get_chunks_success_message(
388            batch_bounds_success_tuples,
389            header=batch_message_header,
390            check_rowcounts_only=check_rowcounts_only,
391        )
392        if deduplicate:
393            deduplicate_success, deduplicate_msg = self.deduplicate(
394                begin=begin,
395                end=end,
396                params=params,
397                workers=workers,
398                debug=debug,
399                **kwargs
400            )
401            return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg
402        return False, batch_chunks_message
403
404    num_batches = len(batches)
405    for batch_i, batch in enumerate(batches):
406        batch_begin = batch[0][0]
407        batch_end = batch[-1][-1]
408        batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})"
409        batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}"
410        retry_failed_batch = True
411        try:
412            for_self = 'for ' + str(self)
413            batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n    ')
414            info(f"Verifying {batch_label_str}\n")
415            batch_success, batch_msg = process_batch(batch)
416        except (KeyboardInterrupt, Exception) as e:
417            batch_success = False
418            batch_msg = str(e)
419            retry_failed_batch = False
420
421        batch_msg_to_print = (
422            f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}"
423        )
424        mrsm.pprint((batch_success, batch_msg_to_print))
425
426        if not batch_success and retry_failed_batch:
427            info(f"Retrying batch {batch_counter_str}...")
428            retry_batch_success, retry_batch_msg = process_batch(batch)
429            retry_batch_msg_to_print = (
430                f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}"
431            )
432            mrsm.pprint((retry_batch_success, retry_batch_msg_to_print))
433
434            batch_success = retry_batch_success
435            batch_msg = retry_batch_msg
436
437        if not batch_success:
438            return False, f"Failed to verify {batch_label}:\n\n{batch_msg}"
439
440    chunks_message = get_chunks_success_message(
441        bounds_success_tuples,
442        header=message_header,
443        check_rowcounts_only=check_rowcounts_only,
444    )
445    return True, chunks_message

Verify the contents of the pipe by resyncing its interval.

Parameters
  • begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
  • end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this as the size of the chunk boundaries. Default to the value set in pipe.parameters['chunk_minutes'] (1440).
  • bounded (Optional[bool], default None): If True, do not verify older than the oldest sync time or newer than the newest. If False, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g. pipe.parameters['verify']['bound_days']).
  • deduplicate (bool, default False): If True, deduplicate the pipe's table after the verification syncs.
  • workers (Optional[int], default None): If provided, limit the verification to this many threads. Use a value of 1 to sync chunks in series.
  • batchsize (Optional[int], default None): If provided, sync this many chunks in parallel. Defaults to Pipe.get_num_workers().
  • skip_chunks_with_greater_rowcounts (bool, default False): If True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount.
  • check_rowcounts_only (bool, default False): If True, only compare rowcounts and print chunks which are out-of-sync.
  • debug (bool, default False): Verbosity toggle.
  • kwargs (Any): All keyword arguments are passed to pipe.sync().
Returns
  • A SuccessTuple indicating whether the pipe was successfully resynced.
def get_bound_interval(self, debug: bool = False) -> Union[datetime.timedelta, int, NoneType]:
546def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]:
547    """
548    Return the interval used to determine the bound time (limit for verification syncs).
549    If the datetime axis is an integer, just return its value.
550
551    Below are the supported keys for the bound interval:
552
553        - `pipe.parameters['verify']['bound_minutes']`
554        - `pipe.parameters['verify']['bound_hours']`
555        - `pipe.parameters['verify']['bound_days']`
556        - `pipe.parameters['verify']['bound_weeks']`
557        - `pipe.parameters['verify']['bound_years']`
558        - `pipe.parameters['verify']['bound_seconds']`
559
560    If multiple keys are present, the first on this priority list will be used.
561
562    Returns
563    -------
564    A `timedelta` or `int` value to be used to determine the bound time.
565    """
566    verify_params = self.parameters.get('verify', {})
567    prefix = 'bound_'
568    suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds')
569    keys_to_search = {
570        key: val
571        for key, val in verify_params.items()
572        if key.startswith(prefix)
573    }
574    bound_time_key, bound_time_value = None, None
575    for key, value in keys_to_search.items():
576        for suffix in suffixes_to_check:
577            if key == prefix + suffix:
578                bound_time_key = key
579                bound_time_value = value
580                break
581        if bound_time_key is not None:
582            break
583
584    if bound_time_value is None:
585        return bound_time_value
586
587    dt_col = self.columns.get('datetime', None)
588    if not dt_col:
589        return bound_time_value
590
591    dt_typ = self.dtypes.get(dt_col, 'datetime')
592    if 'int' in dt_typ.lower():
593        return int(bound_time_value)
594
595    interval_type = bound_time_key.replace(prefix, '')
596    return timedelta(**{interval_type: bound_time_value})

Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.

Below are the supported keys for the bound interval:

- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`

If multiple keys are present, the first on this priority list will be used.

Returns
  • A timedelta or int value to be used to determine the bound time.
def get_bound_time(self, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
599def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]:
600    """
601    The bound time is the limit at which long-running verification syncs should stop.
602    A value of `None` means verification syncs should be unbounded.
603
604    Like deriving a backtrack time from `pipe.get_sync_time()`,
605    the bound time is the sync time minus a large window (e.g. 366 days).
606
607    Unbound verification syncs (i.e. `bound_time is None`)
608    if the oldest sync time is less than the bound interval.
609
610    Returns
611    -------
612    A `datetime` or `int` corresponding to the
613    `begin` bound for verification and deduplication syncs.
614    """
615    bound_interval = self.get_bound_interval(debug=debug)
616    if bound_interval is None:
617        return None
618
619    sync_time = self.get_sync_time(debug=debug)
620    if sync_time is None:
621        return None
622
623    bound_time = sync_time - bound_interval
624    oldest_sync_time = self.get_sync_time(newest=False, debug=debug)
625    max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days']
626
627    extreme_sync_times_delta = (
628        hasattr(oldest_sync_time, 'tzinfo')
629        and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days)
630    )
631
632    return (
633        bound_time
634        if bound_time > oldest_sync_time or extreme_sync_times_delta
635        else None
636    )

The bound time is the limit at which long-running verification syncs should stop. A value of None means verification syncs should be unbounded.

Like deriving a backtrack time from pipe.get_sync_time(), the bound time is the sync time minus a large window (e.g. 366 days).

Unbound verification syncs (i.e. bound_time is None) if the oldest sync time is less than the bound interval.

Returns
  • A datetime or int corresponding to the
  • begin bound for verification and deduplication syncs.
def delete(self, drop: bool = True, debug: bool = False, **kw) -> Tuple[bool, str]:
12def delete(
13    self,
14    drop: bool = True,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Call the Pipe's instance connector's `delete_pipe()` method.
20
21    Parameters
22    ----------
23    drop: bool, default True
24        If `True`, drop the pipes' target table.
25
26    debug : bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success (`bool`), message (`str`).
32
33    """
34    from meerschaum.utils.warnings import warn
35    from meerschaum.utils.venv import Venv
36    from meerschaum.connectors import get_connector_plugin
37
38    if self.temporary:
39        if self.cache:
40            invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug)
41            if not invalidate_success:
42                return invalidate_success, invalidate_msg
43
44        return (
45            False,
46            "Cannot delete pipes created with `temporary=True` (read-only). "
47            + "You may want to call `pipe.drop()` instead."
48        )
49
50    if drop:
51        drop_success, drop_msg = self.drop(debug=debug)
52        if not drop_success:
53            warn(f"Failed to drop {self}:\n{drop_msg}")
54
55    with Venv(get_connector_plugin(self.instance_connector)):
56        result = self.instance_connector.delete_pipe(self, debug=debug, **kw)
57
58    if not isinstance(result, tuple):
59        return False, f"Received an unexpected result from '{self.instance_connector}': {result}"
60
61    if result[0]:
62        self._invalidate_cache(hard=True, debug=debug)
63
64    return result

Call the Pipe's instance connector's delete_pipe() method.

Parameters
  • drop (bool, default True): If True, drop the pipes' target table.
  • debug (bool, default False): Verbosity toggle.
Returns
def drop(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def drop(
15    self,
16    debug: bool = False,
17    **kw: Any
18) -> SuccessTuple:
19    """
20    Call the Pipe's instance connector's `drop_pipe()` method.
21
22    Parameters
23    ----------
24    debug: bool, default False:
25        Verbosity toggle.
26
27    Returns
28    -------
29    A `SuccessTuple` of success, message.
30
31    """
32    from meerschaum.utils.venv import Venv
33    from meerschaum.connectors import get_connector_plugin
34
35    self._clear_cache_key('_exists', debug=debug)
36
37    with Venv(get_connector_plugin(self.instance_connector)):
38        if hasattr(self.instance_connector, 'drop_pipe'):
39            result = self.instance_connector.drop_pipe(self, debug=debug, **kw)
40        else:
41            result = (
42                False,
43                (
44                    "Cannot drop pipes for instance connectors of type "
45                    f"'{self.instance_connector.type}'."
46                )
47            )
48
49    self._clear_cache_key('_exists', debug=debug)
50    self._clear_cache_key('_exists_timestamp', debug=debug)
51
52    return result

Call the Pipe's instance connector's drop_pipe() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def drop_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 55def drop_indices(
 56    self,
 57    columns: Optional[List[str]] = None,
 58    debug: bool = False,
 59    **kw: Any
 60) -> SuccessTuple:
 61    """
 62    Call the Pipe's instance connector's `drop_indices()` method.
 63
 64    Parameters
 65    ----------
 66    columns: Optional[List[str]] = None
 67        If provided, only drop indices in the given list.
 68
 69    debug: bool, default False:
 70        Verbosity toggle.
 71
 72    Returns
 73    -------
 74    A `SuccessTuple` of success, message.
 75
 76    """
 77    from meerschaum.utils.venv import Venv
 78    from meerschaum.connectors import get_connector_plugin
 79
 80    self._clear_cache_key('_columns_indices', debug=debug)
 81    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
 82    self._clear_cache_key('_columns_types', debug=debug)
 83    self._clear_cache_key('_columns_types_timestamp', debug=debug)
 84
 85    with Venv(get_connector_plugin(self.instance_connector)):
 86        if hasattr(self.instance_connector, 'drop_pipe_indices'):
 87            result = self.instance_connector.drop_pipe_indices(
 88                self,
 89                columns=columns,
 90                debug=debug,
 91                **kw
 92            )
 93        else:
 94            result = (
 95                False,
 96                (
 97                    "Cannot drop indices for instance connectors of type "
 98                    f"'{self.instance_connector.type}'."
 99                )
100            )
101
102    self._clear_cache_key('_columns_indices', debug=debug)
103    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
104    self._clear_cache_key('_columns_types', debug=debug)
105    self._clear_cache_key('_columns_types_timestamp', debug=debug)
106
107    return result

Call the Pipe's instance connector's drop_indices() method.

Parameters
  • columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
  • debug (bool, default False:): Verbosity toggle.
Returns
def create_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def create_indices(
15    self,
16    columns: Optional[List[str]] = None,
17    debug: bool = False,
18    **kw: Any
19) -> SuccessTuple:
20    """
21    Call the Pipe's instance connector's `create_pipe_indices()` method.
22
23    Parameters
24    ----------
25    debug: bool, default False:
26        Verbosity toggle.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31
32    """
33    from meerschaum.utils.venv import Venv
34    from meerschaum.connectors import get_connector_plugin
35
36    self._clear_cache_key('_columns_indices', debug=debug)
37    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
38    self._clear_cache_key('_columns_types', debug=debug)
39    self._clear_cache_key('_columns_types_timestamp', debug=debug)
40
41    with Venv(get_connector_plugin(self.instance_connector)):
42        if hasattr(self.instance_connector, 'create_pipe_indices'):
43            result = self.instance_connector.create_pipe_indices(
44                self,
45                columns=columns,
46                debug=debug,
47                **kw
48            )
49        else:
50            result = (
51                False,
52                (
53                    "Cannot create indices for instance connectors of type "
54                    f"'{self.instance_connector.type}'."
55                )
56            )
57
58    self._clear_cache_key('_columns_indices', debug=debug)
59    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
60    self._clear_cache_key('_columns_types', debug=debug)
61    self._clear_cache_key('_columns_types_timestamp', debug=debug)
62
63    return result

Call the Pipe's instance connector's create_pipe_indices() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def clear( self, begin: Optional[datetime.datetime] = None, end: Optional[datetime.datetime] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
16def clear(
17    self,
18    begin: Optional[datetime] = None,
19    end: Optional[datetime] = None,
20    params: Optional[Dict[str, Any]] = None,
21    debug: bool = False,
22    **kwargs: Any
23) -> SuccessTuple:
24    """
25    Call the Pipe's instance connector's `clear_pipe` method.
26
27    Parameters
28    ----------
29    begin: Optional[datetime], default None:
30        If provided, only remove rows newer than this datetime value.
31
32    end: Optional[datetime], default None:
33        If provided, only remove rows older than this datetime column (not including end).
34
35    params: Optional[Dict[str, Any]], default None
36         See `meerschaum.utils.sql.build_where`.
37
38    debug: bool, default False:
39        Verbositity toggle.
40
41    Returns
42    -------
43    A `SuccessTuple` corresponding to whether this procedure completed successfully.
44
45    Examples
46    --------
47    >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
48    >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
49    >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
50    >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
51    >>> 
52    >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
53    >>> pipe.get_data()
54              dt
55    0 2020-01-01
56
57    """
58    from meerschaum.utils.warnings import warn
59    from meerschaum.utils.venv import Venv
60    from meerschaum.connectors import get_connector_plugin
61
62    begin, end = self.parse_date_bounds(begin, end)
63
64    with Venv(get_connector_plugin(self.instance_connector)):
65        return self.instance_connector.clear_pipe(
66            self,
67            begin=begin,
68            end=end,
69            params=params,
70            debug=debug,
71            **kwargs
72        )

Call the Pipe's instance connector's clear_pipe method.

Parameters
  • begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
  • end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): See meerschaum.utils.sql.build_where.
  • debug (bool, default False:): Verbositity toggle.
Returns
  • A SuccessTuple corresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>> 
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
          dt
0 2020-01-01
def deduplicate( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.datetime, int, NoneType] = None, bounded: Optional[bool] = None, workers: Optional[int] = None, debug: bool = False, _use_instance_method: bool = True, **kwargs: Any) -> Tuple[bool, str]:
 15def deduplicate(
 16    self,
 17    begin: Union[datetime, int, None] = None,
 18    end: Union[datetime, int, None] = None,
 19    params: Optional[Dict[str, Any]] = None,
 20    chunk_interval: Union[datetime, int, None] = None,
 21    bounded: Optional[bool] = None,
 22    workers: Optional[int] = None,
 23    debug: bool = False,
 24    _use_instance_method: bool = True,
 25    **kwargs: Any
 26) -> SuccessTuple:
 27    """
 28    Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows.
 29
 30    Parameters
 31    ----------
 32    begin: Union[datetime, int, None], default None:
 33        If provided, only deduplicate rows newer than this datetime value.
 34
 35    end: Union[datetime, int, None], default None:
 36        If provided, only deduplicate rows older than this datetime column (not including end).
 37
 38    params: Optional[Dict[str, Any]], default None
 39        Restrict deduplication to this filter (for multiplexed data streams).
 40        See `meerschaum.utils.sql.build_where`.
 41
 42    chunk_interval: Union[timedelta, int, None], default None
 43        If provided, use this for the chunk bounds.
 44        Defaults to the value set in `pipe.parameters['chunk_minutes']` (1440).
 45
 46    bounded: Optional[bool], default None
 47        Only check outside the oldest and newest sync times if bounded is explicitly `False`.
 48
 49    workers: Optional[int], default None
 50        If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
 51
 52    debug: bool, default False:
 53        Verbositity toggle.
 54
 55    kwargs: Any
 56        All other keyword arguments are passed to
 57        `pipe.sync()`, `pipe.clear()`, and `pipe.get_data().
 58
 59    Returns
 60    -------
 61    A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated.
 62    """
 63    from meerschaum.utils.warnings import warn, info
 64    from meerschaum.utils.misc import interval_str, items_str
 65    from meerschaum.utils.venv import Venv
 66    from meerschaum.connectors import get_connector_plugin
 67    from meerschaum.utils.pool import get_pool
 68
 69    begin, end = self.parse_date_bounds(begin, end)
 70
 71    workers = self.get_num_workers(workers=workers)
 72    pool = get_pool(workers=workers)
 73
 74    if _use_instance_method:
 75        with Venv(get_connector_plugin(self.instance_connector)):
 76            if hasattr(self.instance_connector, 'deduplicate_pipe'):
 77                return self.instance_connector.deduplicate_pipe(
 78                    self,
 79                    begin=begin,
 80                    end=end,
 81                    params=params,
 82                    bounded=bounded,
 83                    debug=debug,
 84                    **kwargs
 85                )
 86
 87    ### Only unbound if explicitly False.
 88    if bounded is None:
 89        bounded = True
 90    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
 91
 92    bound_time = self.get_bound_time(debug=debug)
 93    if bounded and begin is None:
 94        begin = (
 95            bound_time
 96            if bound_time is not None
 97            else self.get_sync_time(newest=False, debug=debug)
 98        )
 99    if bounded and end is None:
100        end = self.get_sync_time(newest=True, debug=debug)
101        if end is not None:
102            end += (
103                timedelta(minutes=1)
104                if hasattr(end, 'tzinfo')
105                else 1
106            )
107
108    chunk_bounds = self.get_chunk_bounds(
109        bounded=bounded,
110        begin=begin,
111        end=end,
112        chunk_interval=chunk_interval,
113        debug=debug,
114    )
115
116    indices = [col for col in self.columns.values() if col]
117    if not indices:
118        return False, "Cannot deduplicate without index columns."
119
120    def process_chunk_bounds(bounds) -> Tuple[
121        Tuple[
122            Union[datetime, int, None],
123            Union[datetime, int, None]
124        ],
125        SuccessTuple
126    ]:
127        ### Only selecting the index values here to keep bandwidth down.
128        chunk_begin, chunk_end = bounds
129        chunk_df = self.get_data(
130            select_columns=indices, 
131            begin=chunk_begin,
132            end=chunk_end,
133            params=params,
134            debug=debug,
135        )
136        if chunk_df is None:
137            return bounds, (True, "")
138        existing_chunk_len = len(chunk_df)
139        deduped_chunk_df = chunk_df.drop_duplicates(keep='last')
140        deduped_chunk_len = len(deduped_chunk_df)
141
142        if existing_chunk_len == deduped_chunk_len:
143            return bounds, (True, "")
144
145        chunk_msg_header = f"\n{chunk_begin} - {chunk_end}"
146        chunk_msg_body = ""
147
148        full_chunk = self.get_data(
149            begin=chunk_begin,
150            end=chunk_end,
151            params=params,
152            debug=debug,
153        )
154        if full_chunk is None or len(full_chunk) == 0:
155            return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...")
156
157        chunk_indices = [ix for ix in indices if ix in full_chunk.columns]
158        if not chunk_indices:
159            return bounds, (False, f"None of {items_str(indices)} were present in chunk.")
160        try:
161            full_chunk = full_chunk.drop_duplicates(
162                subset=chunk_indices,
163                keep='last'
164            ).reset_index(
165                drop=True,
166            )
167        except Exception as e:
168            return (
169                bounds,
170                (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})")
171            )
172
173        clear_success, clear_msg = self.clear(
174            begin=chunk_begin,
175            end=chunk_end,
176            params=params,
177            debug=debug,
178        )
179        if not clear_success:
180            chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n"
181            warn(chunk_msg_body)
182
183        sync_success, sync_msg = self.sync(full_chunk, debug=debug)
184        if not sync_success:
185            chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n"
186
187        ### Finally check if the deduplication worked.
188        chunk_rowcount = self.get_rowcount(
189            begin=chunk_begin,
190            end=chunk_end,
191            params=params,
192            debug=debug,
193        )
194        if chunk_rowcount != deduped_chunk_len:
195            return bounds, (
196                False, (
197                    chunk_msg_header + "\n"
198                    + chunk_msg_body + ("\n" if chunk_msg_body else '')
199                    + "Chunk rowcounts still differ ("
200                    + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)."
201                )
202            )
203
204        return bounds, (
205            True, (
206                chunk_msg_header + "\n"
207                + chunk_msg_body + ("\n" if chunk_msg_body else '')
208                + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows."
209            )
210        )
211
212    info(
213        f"Deduplicating {len(chunk_bounds)} chunk"
214        + ('s' if len(chunk_bounds) != 1 else '')
215        + f" ({'un' if not bounded else ''}bounded)"
216        + f" of size '{interval_str(chunk_interval)}'"
217        + f" on {self}."
218    )
219    bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds))
220    bounds_successes = {
221        bounds: success_tuple
222        for bounds, success_tuple in bounds_success_tuples.items()
223        if success_tuple[0]
224    }
225    bounds_failures = {
226        bounds: success_tuple
227        for bounds, success_tuple in bounds_success_tuples.items()
228        if not success_tuple[0]
229    }
230
231    ### No need to retry if everything failed.
232    if len(bounds_failures) > 0 and len(bounds_successes) == 0:
233        return (
234            False,
235            (
236                f"Failed to deduplicate {len(bounds_failures)} chunk"
237                + ('s' if len(bounds_failures) != 1 else '')
238                + ".\n"
239                + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg])
240            )
241        )
242
243    retry_bounds = [bounds for bounds in bounds_failures]
244    if not retry_bounds:
245        return (
246            True,
247            (
248                f"Successfully deduplicated {len(bounds_successes)} chunk"
249                + ('s' if len(bounds_successes) != 1 else '')
250                + ".\n"
251                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
252            ).rstrip('\n')
253        )
254
255    info(f"Retrying {len(retry_bounds)} chunks for {self}...")
256    retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds))
257    retry_bounds_successes = {
258        bounds: success_tuple
259        for bounds, success_tuple in bounds_success_tuples.items()
260        if success_tuple[0]
261    }
262    retry_bounds_failures = {
263        bounds: success_tuple
264        for bounds, success_tuple in bounds_success_tuples.items()
265        if not success_tuple[0]
266    }
267
268    bounds_successes.update(retry_bounds_successes)
269    if not retry_bounds_failures:
270        return (
271            True,
272            (
273                f"Successfully deduplicated {len(bounds_successes)} chunk"
274                + ('s' if len(bounds_successes) != 1 else '')
275                + f"({len(retry_bounds_successes)} retried):\n"
276                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
277            ).rstrip('\n')
278        )
279
280    return (
281        False,
282        (
283            f"Failed to deduplicate {len(bounds_failures)} chunk"
284            + ('s' if len(retry_bounds_failures) != 1 else '')
285            + ".\n"
286            + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg])
287        ).rstrip('\n')
288    )

Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.

Parameters
  • begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
  • end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): Restrict deduplication to this filter (for multiplexed data streams). See meerschaum.utils.sql.build_where.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this for the chunk bounds. Defaults to the value set in pipe.parameters['chunk_minutes'] (1440).
  • bounded (Optional[bool], default None): Only check outside the oldest and newest sync times if bounded is explicitly False.
  • workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
  • debug (bool, default False:): Verbositity toggle.
  • kwargs (Any): All other keyword arguments are passed to pipe.sync(), pipe.clear(), and `pipe.get_data().
Returns
  • A SuccessTuple corresponding to whether all of the chunks were successfully deduplicated.
def bootstrap( self, debug: bool = False, yes: bool = False, force: bool = False, noask: bool = False, shell: bool = False, **kw) -> Tuple[bool, str]:
 16def bootstrap(
 17    self,
 18    debug: bool = False,
 19    yes: bool = False,
 20    force: bool = False,
 21    noask: bool = False,
 22    shell: bool = False,
 23    **kw
 24) -> SuccessTuple:
 25    """
 26    Prompt the user to create a pipe's requirements all from one method.
 27    This method shouldn't be used in any automated scripts because it interactively
 28    prompts the user and therefore may hang.
 29
 30    Parameters
 31    ----------
 32    debug: bool, default False:
 33        Verbosity toggle.
 34
 35    yes: bool, default False:
 36        Print the questions and automatically agree.
 37
 38    force: bool, default False:
 39        Skip the questions and agree anyway.
 40
 41    noask: bool, default False:
 42        Print the questions but go with the default answer.
 43
 44    shell: bool, default False:
 45        Used to determine if we are in the interactive shell.
 46        
 47    Returns
 48    -------
 49    A `SuccessTuple` corresponding to the success of this procedure.
 50
 51    """
 52
 53    from meerschaum.utils.warnings import info
 54    from meerschaum.utils.prompt import prompt, yes_no
 55    from meerschaum.utils.formatting import pprint
 56    from meerschaum.config import get_config
 57    from meerschaum.utils.formatting._shell import clear_screen
 58    from meerschaum.utils.formatting import print_tuple
 59    from meerschaum.actions import actions
 60    from meerschaum.utils.venv import Venv
 61    from meerschaum.connectors import get_connector_plugin
 62
 63    _clear = get_config('shell', 'clear_screen', patch=True)
 64
 65    if self.get_id(debug=debug) is not None:
 66        delete_tuple = self.delete(debug=debug)
 67        if not delete_tuple[0]:
 68            return delete_tuple
 69
 70    if _clear:
 71        clear_screen(debug=debug)
 72
 73    _parameters = _get_parameters(self, debug=debug)
 74    self.parameters = _parameters
 75    pprint(self.parameters)
 76    try:
 77        prompt(
 78            f"\n    Press [Enter] to register {self} with the above configuration:",
 79            icon = False
 80        )
 81    except KeyboardInterrupt:
 82        return False, f"Aborted bootstrapping {self}."
 83
 84    with Venv(get_connector_plugin(self.instance_connector)):
 85        register_tuple = self.instance_connector.register_pipe(self, debug=debug)
 86
 87    if not register_tuple[0]:
 88        return register_tuple
 89
 90    if _clear:
 91        clear_screen(debug=debug)
 92
 93    try:
 94        if yes_no(
 95            f"Would you like to edit the definition for {self}?",
 96            yes=yes,
 97            noask=noask,
 98            default='n',
 99        ):
100            edit_tuple = self.edit_definition(debug=debug)
101            if not edit_tuple[0]:
102                return edit_tuple
103
104        if yes_no(
105            f"Would you like to try syncing {self} now?",
106            yes=yes,
107            noask=noask,
108            default='n',
109        ):
110            sync_tuple = actions['sync'](
111                ['pipes'],
112                connector_keys=[self.connector_keys],
113                metric_keys=[self.metric_key],
114                location_keys=[self.location_key],
115                mrsm_instance=str(self.instance_connector),
116                debug=debug,
117                shell=shell,
118            )
119            if not sync_tuple[0]:
120                return sync_tuple
121    except Exception as e:
122        return False, f"Failed to bootstrap {self}:\n" + str(e)
123
124    print_tuple((True, f"Finished bootstrapping {self}!"))
125    info(
126        "You can edit this pipe later with `edit pipes` "
127        + "or set the definition with `edit pipes definition`.\n"
128        + "    To sync data into your pipe, run `sync pipes`."
129    )
130
131    return True, "Success"

Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.

Parameters
  • debug (bool, default False:): Verbosity toggle.
  • yes (bool, default False:): Print the questions and automatically agree.
  • force (bool, default False:): Skip the questions and agree anyway.
  • noask (bool, default False:): Print the questions but go with the default answer.
  • shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
  • A SuccessTuple corresponding to the success of this procedure.
def enforce_dtypes( self, df: pandas.core.frame.DataFrame, chunksize: Optional[int] = -1, enforce: bool = True, safe_copy: bool = True, dtypes: Optional[Dict[str, str]] = None, debug: bool = False) -> pandas.core.frame.DataFrame:
 20def enforce_dtypes(
 21    self,
 22    df: 'pd.DataFrame',
 23    chunksize: Optional[int] = -1,
 24    enforce: bool = True,
 25    safe_copy: bool = True,
 26    dtypes: Optional[Dict[str, str]] = None,
 27    debug: bool = False,
 28) -> 'pd.DataFrame':
 29    """
 30    Cast the input dataframe to the pipe's registered data types.
 31    If the pipe does not exist and dtypes are not set, return the dataframe.
 32    """
 33    import traceback
 34    from meerschaum.utils.warnings import warn
 35    from meerschaum.utils.debug import dprint
 36    from meerschaum.utils.dataframe import (
 37        parse_df_datetimes,
 38        enforce_dtypes as _enforce_dtypes,
 39        parse_simple_lines,
 40    )
 41    from meerschaum.utils.dtypes import are_dtypes_equal
 42    from meerschaum.utils.packages import import_pandas
 43    pd = import_pandas(debug=debug)
 44    if df is None:
 45        if debug:
 46            dprint(
 47                "Received None instead of a DataFrame.\n"
 48                + "    Skipping dtype enforcement..."
 49            )
 50        return df
 51
 52    if not self.enforce:
 53        enforce = False
 54
 55    explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {}
 56    pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes
 57
 58    try:
 59        if isinstance(df, str):
 60            if df.strip() and df.strip()[0] not in ('{', '['):
 61                df = parse_df_datetimes(
 62                    parse_simple_lines(df),
 63                    ignore_cols=[
 64                        col
 65                        for col, dtype in pipe_dtypes.items()
 66                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 67                    ],
 68                )
 69            else:
 70                df = parse_df_datetimes(
 71                    pd.read_json(StringIO(df)),
 72                    ignore_cols=[
 73                        col
 74                        for col, dtype in pipe_dtypes.items()
 75                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 76                    ],
 77                    ignore_all=(not enforce),
 78                    strip_timezone=(self.tzinfo is None),
 79                    chunksize=chunksize,
 80                    debug=debug,
 81                )
 82        elif isinstance(df, (dict, list, tuple)):
 83            df = parse_df_datetimes(
 84                df,
 85                ignore_cols=[
 86                    col
 87                    for col, dtype in pipe_dtypes.items()
 88                    if (not enforce or not are_dtypes_equal(str(dtype), 'datetime'))
 89                ],
 90                strip_timezone=(self.tzinfo is None),
 91                chunksize=chunksize,
 92                debug=debug,
 93            )
 94    except Exception as e:
 95        warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}")
 96        return None
 97
 98    if not pipe_dtypes:
 99        if debug:
100            dprint(
101                f"Could not find dtypes for {self}.\n"
102                + "Skipping dtype enforcement..."
103            )
104        return df
105
106    return _enforce_dtypes(
107        df,
108        pipe_dtypes,
109        explicit_dtypes=explicit_dtypes,
110        safe_copy=safe_copy,
111        strip_timezone=(self.tzinfo is None),
112        coerce_numeric=self.mixed_numerics,
113        coerce_timezone=enforce,
114        debug=debug,
115    )

Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.

def infer_dtypes( self, persist: bool = False, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
118def infer_dtypes(
119    self,
120    persist: bool = False,
121    refresh: bool = False,
122    debug: bool = False,
123) -> Dict[str, Any]:
124    """
125    If `dtypes` is not set in `meerschaum.Pipe.parameters`,
126    infer the data types from the underlying table if it exists.
127
128    Parameters
129    ----------
130    persist: bool, default False
131        If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`.
132        NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only.
133
134    refresh: bool, default False
135        If `True`, retrieve the latest columns-types for the pipe.
136        See `Pipe.get_columns.types()`.
137
138    Returns
139    -------
140    A dictionary of strings containing the pandas data types for this Pipe.
141    """
142    if not self.exists(debug=debug):
143        return {}
144
145    from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type
146    from meerschaum.utils.dtypes import to_pandas_dtype
147
148    ### NOTE: get_columns_types() may return either the types as
149    ###       PostgreSQL- or Pandas-style.
150    columns_types = self.get_columns_types(refresh=refresh, debug=debug)
151
152    remote_pd_dtypes = {
153        c: (
154            get_pd_type_from_db_type(t, allow_custom_dtypes=True)
155            if str(t).isupper()
156            else to_pandas_dtype(t)
157        )
158        for c, t in columns_types.items()
159    } if columns_types else {}
160    if not persist:
161        return remote_pd_dtypes
162
163    parameters = self.get_parameters(refresh=refresh, debug=debug)
164    dtypes = parameters.get('dtypes', {})
165    dtypes.update({
166        col: typ
167        for col, typ in remote_pd_dtypes.items()
168        if col not in dtypes
169    })
170    self.dtypes = dtypes
171    self.edit(interactive=False, debug=debug)
172    return remote_pd_dtypes

If dtypes is not set in meerschaum.Pipe.parameters, infer the data types from the underlying table if it exists.

Parameters
  • persist (bool, default False): If True, persist the inferred data types to meerschaum.Pipe.parameters. NOTE: Use with caution! Generally dtypes is meant to be user-configurable only.
  • refresh (bool, default False): If True, retrieve the latest columns-types for the pipe. See Pipe.get_columns.types().
Returns
  • A dictionary of strings containing the pandas data types for this Pipe.
def copy_to( self, instance_keys: str, sync: bool = True, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 15def copy_to(
 16    self,
 17    instance_keys: str,
 18    sync: bool = True,
 19    begin: Union[datetime, int, None] = None,
 20    end: Union[datetime, int, None] = None,
 21    params: Optional[Dict[str, Any]] = None,
 22    chunk_interval: Union[timedelta, int, None] = None,
 23    debug: bool = False,
 24    **kwargs: Any
 25) -> SuccessTuple:
 26    """
 27    Copy a pipe to another instance.
 28
 29    Parameters
 30    ----------
 31    instance_keys: str
 32        The instance to which to copy this pipe.
 33
 34    sync: bool, default True
 35        If `True`, sync the source pipe's documents 
 36
 37    begin: Union[datetime, int, None], default None
 38        Beginning datetime value to pass to `Pipe.get_data()`.
 39
 40    end: Union[datetime, int, None], default None
 41        End datetime value to pass to `Pipe.get_data()`.
 42
 43    params: Optional[Dict[str, Any]], default None
 44        Parameters filter to pass to `Pipe.get_data()`.
 45
 46    chunk_interval: Union[timedelta, int, None], default None
 47        The size of chunks to retrieve from `Pipe.get_data()` for syncing.
 48
 49    kwargs: Any
 50        Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`.
 51
 52    Returns
 53    -------
 54    A SuccessTuple indicating success.
 55    """
 56    if str(instance_keys) == self.instance_keys:
 57        return False, f"Cannot copy {self} to instance '{instance_keys}'."
 58
 59    begin, end = self.parse_date_bounds(begin, end)
 60
 61    new_pipe = mrsm.Pipe(
 62        self.connector_keys,
 63        self.metric_key,
 64        self.location_key,
 65        parameters=self.parameters.copy(),
 66        instance=instance_keys,
 67    )
 68
 69    new_pipe_is_registered = new_pipe.get_id() is not None
 70
 71    metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register
 72    metadata_success, metadata_msg = metadata_method(debug=debug)
 73    if not metadata_success:
 74        return metadata_success, metadata_msg
 75
 76    if not self.exists(debug=debug):
 77        return True, f"{self} does not exist; nothing to sync."
 78
 79    original_as_iterator = kwargs.get('as_iterator', None)
 80    kwargs['as_iterator'] = True
 81
 82    chunk_generator = self.get_data(
 83        begin=begin,
 84        end=end,
 85        params=params,
 86        chunk_interval=chunk_interval,
 87        debug=debug,
 88        **kwargs
 89    )
 90
 91    if original_as_iterator is None:
 92        _ = kwargs.pop('as_iterator', None)
 93    else:
 94        kwargs['as_iterator'] = original_as_iterator
 95
 96    sync_success, sync_msg = new_pipe.sync(
 97        chunk_generator,
 98        begin=begin,
 99        end=end,
100        params=params,
101        debug=debug,
102        **kwargs
103    )
104    msg = (
105        f"Successfully synced {new_pipe}:\n{sync_msg}"
106        if sync_success
107        else f"Failed to sync {new_pipe}:\n{sync_msg}"
108    )
109    return sync_success, msg

Copy a pipe to another instance.

Parameters
  • instance_keys (str): The instance to which to copy this pipe.
  • sync (bool, default True): If True, sync the source pipe's documents
  • begin (Union[datetime, int, None], default None): Beginning datetime value to pass to Pipe.get_data().
  • end (Union[datetime, int, None], default None): End datetime value to pass to Pipe.get_data().
  • params (Optional[Dict[str, Any]], default None): Parameters filter to pass to Pipe.get_data().
  • chunk_interval (Union[timedelta, int, None], default None): The size of chunks to retrieve from Pipe.get_data() for syncing.
  • kwargs (Any): Additional flags to pass to Pipe.get_data() and Pipe.sync(), e.g. workers.
Returns
  • A SuccessTuple indicating success.
class Plugin:
 30class Plugin:
 31    """Handle packaging of Meerschaum plugins."""
 32
 33    def __init__(
 34        self,
 35        name: str,
 36        version: Optional[str] = None,
 37        user_id: Optional[int] = None,
 38        required: Optional[List[str]] = None,
 39        attributes: Optional[Dict[str, Any]] = None,
 40        archive_path: Optional[pathlib.Path] = None,
 41        venv_path: Optional[pathlib.Path] = None,
 42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
 43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
 44    ):
 45        from meerschaum._internal.static import STATIC_CONFIG
 46        from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH
 47        sep = STATIC_CONFIG['plugins']['repo_separator']
 48        _repo = None
 49        if sep in name:
 50            try:
 51                name, _repo = name.split(sep)
 52            except Exception as e:
 53                error(f"Invalid plugin name: '{name}'")
 54        self._repo_in_name = _repo
 55
 56        if attributes is None:
 57            attributes = {}
 58        self.name = name
 59        self.attributes = attributes
 60        self.user_id = user_id
 61        self._version = version
 62        if required:
 63            self._required = required
 64        self.archive_path = (
 65            archive_path if archive_path is not None
 66            else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
 67        )
 68        self.venv_path = (
 69            venv_path if venv_path is not None
 70            else VIRTENV_RESOURCES_PATH / self.name
 71        )
 72        self._repo_connector = repo_connector
 73        self._repo_keys = repo
 74
 75
 76    @property
 77    def repo_connector(self):
 78        """
 79        Return the repository connector for this plugin.
 80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
 81        """
 82        if self._repo_connector is None:
 83            from meerschaum.connectors.parse import parse_repo_keys
 84
 85            repo_keys = self._repo_keys or self._repo_in_name
 86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
 87                error(
 88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
 89                )
 90            repo_connector = parse_repo_keys(repo_keys)
 91            self._repo_connector = repo_connector
 92        return self._repo_connector
 93
 94
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version
106
107
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module
121
122
123    @property
124    def __file__(self) -> Union[str, None]:
125        """
126        Return the file path (str) of the plugin if it exists, otherwise `None`.
127        """
128        if self.__dict__.get('_module', None) is not None:
129            return self.module.__file__
130
131        from meerschaum.config.paths import PLUGINS_RESOURCES_PATH
132
133        potential_dir = PLUGINS_RESOURCES_PATH / self.name
134        if (
135            potential_dir.exists()
136            and potential_dir.is_dir()
137            and (potential_dir / '__init__.py').exists()
138        ):
139            return str((potential_dir / '__init__.py').as_posix())
140
141        potential_file = PLUGINS_RESOURCES_PATH / (self.name + '.py')
142        if potential_file.exists() and not potential_file.is_dir():
143            return str(potential_file.as_posix())
144
145        return None
146
147
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path
159
160
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None
170
171
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path
255
256
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288        from meerschaum.utils.warnings import warn, error
289        if debug:
290            from meerschaum.utils.debug import dprint
291        import tarfile
292        import re
293        import ast
294        from meerschaum.plugins import sync_plugins_symlinks
295        from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum
296        from meerschaum.utils.venv import init_venv
297        from meerschaum.utils.misc import safely_extract_tar
298        from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS
299        old_cwd = os.getcwd()
300        old_version = ''
301        new_version = ''
302        temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name
303        temp_dir.mkdir(exist_ok=True)
304
305        if not self.archive_path.exists():
306            return False, f"Missing archive file for plugin '{self}'."
307        if self.version is not None:
308            old_version = self.version
309            if debug:
310                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
311
312        if debug:
313            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
314
315        try:
316            with tarfile.open(self.archive_path, 'r:gz') as tarf:
317                safely_extract_tar(tarf, temp_dir)
318        except Exception as e:
319            warn(e)
320            return False, f"Failed to extract plugin '{self.name}'."
321
322        ### search for version information
323        files = os.listdir(temp_dir)
324        
325        if str(files[0]) == self.name:
326            is_dir = True
327        elif str(files[0]) == self.name + '.py':
328            is_dir = False
329        else:
330            error(f"Unknown format encountered for plugin '{self}'.")
331
332        fpath = temp_dir / files[0]
333        if is_dir:
334            fpath = fpath / '__init__.py'
335
336        init_venv(self.name, debug=debug)
337        with open(fpath, 'r', encoding='utf-8') as f:
338            init_lines = f.readlines()
339        new_version = None
340        for line in init_lines:
341            if '__version__' not in line:
342                continue
343            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
344            if not version_match:
345                continue
346            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
347            break
348        if not new_version:
349            warn(
350                f"No `__version__` defined for plugin '{self}'. "
351                + "Assuming new version...",
352                stack = False,
353            )
354
355        packaging_version = attempt_import('packaging.version')
356        try:
357            is_new_version = (not new_version and not old_version) or (
358                packaging_version.parse(old_version) < packaging_version.parse(new_version)
359            )
360            is_same_version = new_version and old_version and (
361                packaging_version.parse(old_version) == packaging_version.parse(new_version)
362            )
363        except Exception:
364            is_new_version, is_same_version = True, False
365
366        ### Determine where to permanently store the new plugin.
367        plugin_installation_dir_path = PLUGINS_DIR_PATHS[0]
368        for path in PLUGINS_DIR_PATHS:
369            if not path.exists():
370                warn(f"Plugins path does not exist: {path}", stack=False)
371                continue
372
373            files_in_plugins_dir = os.listdir(path)
374            if (
375                self.name in files_in_plugins_dir
376                or
377                (self.name + '.py') in files_in_plugins_dir
378            ):
379                plugin_installation_dir_path = path
380                break
381
382        success_msg = (
383            f"Successfully installed plugin '{self}'"
384            + ("\n    (skipped dependencies)" if skip_deps else "")
385            + "."
386        )
387        success, abort = None, None
388
389        if is_same_version and not force:
390            success, msg = True, (
391                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
392                "    Install again with `-f` or `--force` to reinstall."
393            )
394            abort = True
395        elif is_new_version or force:
396            for src_dir, dirs, files in os.walk(temp_dir):
397                if success is not None:
398                    break
399                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
400                if not os.path.exists(dst_dir):
401                    os.mkdir(dst_dir)
402                for f in files:
403                    src_file = os.path.join(src_dir, f)
404                    dst_file = os.path.join(dst_dir, f)
405                    if os.path.exists(dst_file):
406                        os.remove(dst_file)
407
408                    if debug:
409                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
410                    try:
411                        shutil.move(src_file, dst_dir)
412                    except Exception:
413                        success, msg = False, (
414                            f"Failed to install plugin '{self}': " +
415                            f"Could not move file '{src_file}' to '{dst_dir}'"
416                        )
417                        print(msg)
418                        break
419            if success is None:
420                success, msg = True, success_msg
421        else:
422            success, msg = False, (
423                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
424                + f"attempted version {new_version}."
425            )
426
427        shutil.rmtree(temp_dir)
428        os.chdir(old_cwd)
429
430        ### Reload the plugin's module.
431        sync_plugins_symlinks(debug=debug)
432        if '_module' in self.__dict__:
433            del self.__dict__['_module']
434        init_venv(venv=self.name, force=True, debug=debug)
435        reload_meerschaum(debug=debug)
436
437        ### if we've already failed, return here
438        if not success or abort:
439            _ongoing_installations.remove(self.full_name)
440            return success, msg
441
442        ### attempt to install dependencies
443        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
444        if not dependencies_installed:
445            _ongoing_installations.remove(self.full_name)
446            return False, f"Failed to install dependencies for plugin '{self}'."
447
448        ### handling success tuple, bool, or other (typically None)
449        setup_tuple = self.setup(debug=debug)
450        if isinstance(setup_tuple, tuple):
451            if not setup_tuple[0]:
452                success, msg = setup_tuple
453        elif isinstance(setup_tuple, bool):
454            if not setup_tuple:
455                success, msg = False, (
456                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
457                    f"Check `setup()` in '{self.__file__}' for more information " +
458                    "(no error message provided)."
459                )
460            else:
461                success, msg = True, success_msg
462        elif setup_tuple is None:
463            success = True
464            msg = (
465                f"Post-install for plugin '{self}' returned None. " +
466                "Assuming plugin successfully installed."
467            )
468            warn(msg)
469        else:
470            success = False
471            msg = (
472                f"Post-install for plugin '{self}' returned unexpected value " +
473                f"of type '{type(setup_tuple)}': {setup_tuple}"
474            )
475
476        _ongoing_installations.remove(self.full_name)
477        _ = self.module
478        return success, msg
479
480
481    def remove_archive(
482        self,        
483        debug: bool = False
484    ) -> SuccessTuple:
485        """Remove a plugin's archive file."""
486        if not self.archive_path.exists():
487            return True, f"Archive file for plugin '{self}' does not exist."
488        try:
489            self.archive_path.unlink()
490        except Exception as e:
491            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
492        return True, "Success"
493
494
495    def remove_venv(
496        self,        
497        debug: bool = False
498    ) -> SuccessTuple:
499        """Remove a plugin's virtual environment."""
500        if not self.venv_path.exists():
501            return True, f"Virtual environment for plugin '{self}' does not exist."
502        try:
503            shutil.rmtree(self.venv_path)
504        except Exception as e:
505            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
506        return True, "Success"
507
508
509    def uninstall(self, debug: bool = False) -> SuccessTuple:
510        """
511        Remove a plugin, its virtual environment, and archive file.
512        """
513        from meerschaum.utils.packages import reload_meerschaum
514        from meerschaum.plugins import sync_plugins_symlinks
515        from meerschaum.utils.warnings import warn, info
516        warnings_thrown_count: int = 0
517        max_warnings: int = 3
518
519        if not self.is_installed():
520            info(
521                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
522                + "Checking for artifacts...",
523                stack = False,
524            )
525        else:
526            real_path = pathlib.Path(os.path.realpath(self.__file__))
527            try:
528                if real_path.name == '__init__.py':
529                    shutil.rmtree(real_path.parent)
530                else:
531                    real_path.unlink()
532            except Exception as e:
533                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
534                warnings_thrown_count += 1
535            else:
536                info(f"Removed source files for plugin '{self.name}'.")
537
538        if self.venv_path.exists():
539            success, msg = self.remove_venv(debug=debug)
540            if not success:
541                warn(msg, stack=False)
542                warnings_thrown_count += 1
543            else:
544                info(f"Removed virtual environment from plugin '{self.name}'.")
545
546        success = warnings_thrown_count < max_warnings
547        sync_plugins_symlinks(debug=debug)
548        self.deactivate_venv(force=True, debug=debug)
549        reload_meerschaum(debug=debug)
550        return success, (
551            f"Successfully uninstalled plugin '{self}'." if success
552            else f"Failed to uninstall plugin '{self}'."
553        )
554
555
556    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
557        """
558        If exists, run the plugin's `setup()` function.
559
560        Parameters
561        ----------
562        *args: str
563            The positional arguments passed to the `setup()` function.
564            
565        debug: bool, default False
566            Verbosity toggle.
567
568        **kw: Any
569            The keyword arguments passed to the `setup()` function.
570
571        Returns
572        -------
573        A `SuccessTuple` or `bool` indicating success.
574
575        """
576        from meerschaum.utils.debug import dprint
577        import inspect
578        _setup = None
579        for name, fp in inspect.getmembers(self.module):
580            if name == 'setup' and inspect.isfunction(fp):
581                _setup = fp
582                break
583
584        ### assume success if no setup() is found (not necessary)
585        if _setup is None:
586            return True
587
588        sig = inspect.signature(_setup)
589        has_debug, has_kw = ('debug' in sig.parameters), False
590        for k, v in sig.parameters.items():
591            if '**' in str(v):
592                has_kw = True
593                break
594
595        _kw = {}
596        if has_kw:
597            _kw.update(kw)
598        if has_debug:
599            _kw['debug'] = debug
600
601        if debug:
602            dprint(f"Running setup for plugin '{self}'...")
603        try:
604            self.activate_venv(debug=debug)
605            return_tuple = _setup(*args, **_kw)
606            self.deactivate_venv(debug=debug)
607        except Exception as e:
608            return False, str(e)
609
610        if isinstance(return_tuple, tuple):
611            return return_tuple
612        if isinstance(return_tuple, bool):
613            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
614        if return_tuple is None:
615            return False, f"Setup for Plugin '{self.name}' returned None."
616        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
617
618
619    def get_dependencies(
620        self,
621        debug: bool = False,
622    ) -> List[str]:
623        """
624        If the Plugin has specified dependencies in a list called `required`, return the list.
625        
626        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
627        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
628
629        Parameters
630        ----------
631        debug: bool, default False
632            Verbosity toggle.
633
634        Returns
635        -------
636        A list of required packages and plugins (str).
637
638        """
639        if '_required' in self.__dict__:
640            return self._required
641
642        ### If the plugin has not yet been imported,
643        ### infer the dependencies from the source text.
644        ### This is not super robust, and it doesn't feel right
645        ### having multiple versions of the logic.
646        ### This is necessary when determining the activation order
647        ### without having import the module.
648        ### For consistency's sake, the module-less method does not cache the requirements.
649        if self.__dict__.get('_module', None) is None:
650            file_path = self.__file__
651            if file_path is None:
652                return []
653            with open(file_path, 'r', encoding='utf-8') as f:
654                text = f.read()
655
656            if 'required' not in text:
657                return []
658
659            ### This has some limitations:
660            ### It relies on `required` being manually declared.
661            ### We lose the ability to dynamically alter the `required` list,
662            ### which is why we've kept the module-reliant method below.
663            import ast, re
664            ### NOTE: This technically would break 
665            ### if `required` was the very first line of the file.
666            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
667            if not req_start_match:
668                return []
669            req_start = req_start_match.start()
670            equals_sign = req_start + text[req_start:].find('=')
671
672            ### Dependencies may have brackets within the strings, so push back the index.
673            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
674            if first_opening_brace == -1:
675                return []
676
677            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
678            if next_closing_brace == -1:
679                return []
680
681            start_ix = first_opening_brace + 1
682            end_ix = next_closing_brace
683
684            num_braces = 0
685            while True:
686                if '[' not in text[start_ix:end_ix]:
687                    break
688                num_braces += 1
689                start_ix = end_ix
690                end_ix += text[end_ix + 1:].find(']') + 1
691
692            req_end = end_ix + 1
693            req_text = (
694                text[(first_opening_brace-1):req_end]
695                .lstrip()
696                .replace('=', '', 1)
697                .lstrip()
698                .rstrip()
699            )
700            try:
701                required = ast.literal_eval(req_text)
702            except Exception as e:
703                warn(
704                    f"Unable to determine requirements for plugin '{self.name}' "
705                    + "without importing the module.\n"
706                    + "    This may be due to dynamically setting the global `required` list.\n"
707                    + f"    {e}"
708                )
709                return []
710            return required
711
712        import inspect
713        self.activate_venv(dependencies=False, debug=debug)
714        required = []
715        for name, val in inspect.getmembers(self.module):
716            if name == 'required':
717                required = val
718                break
719        self._required = required
720        self.deactivate_venv(dependencies=False, debug=debug)
721        return required
722
723
724    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
725        """
726        Return a list of required Plugin objects.
727        """
728        from meerschaum.utils.warnings import warn
729        from meerschaum.config import get_config
730        from meerschaum._internal.static import STATIC_CONFIG
731        from meerschaum.connectors.parse import is_valid_connector_keys
732        plugins = []
733        _deps = self.get_dependencies(debug=debug)
734        sep = STATIC_CONFIG['plugins']['repo_separator']
735        plugin_names = [
736            _d[len('plugin:'):] for _d in _deps
737            if _d.startswith('plugin:') and len(_d) > len('plugin:')
738        ]
739        default_repo_keys = get_config('meerschaum', 'repository')
740        skipped_repo_keys = set()
741
742        for _plugin_name in plugin_names:
743            if sep in _plugin_name:
744                try:
745                    _plugin_name, _repo_keys = _plugin_name.split(sep)
746                except Exception:
747                    _repo_keys = default_repo_keys
748                    warn(
749                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
750                        + f"Will try to use '{_repo_keys}' instead.",
751                        stack = False,
752                    )
753            else:
754                _repo_keys = default_repo_keys
755
756            if _repo_keys in skipped_repo_keys:
757                continue
758
759            if not is_valid_connector_keys(_repo_keys):
760                warn(
761                    f"Invalid connector '{_repo_keys}'.\n"
762                    f"    Skipping required plugins from repository '{_repo_keys}'",
763                    stack=False,
764                )
765                continue
766
767            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
768
769        return plugins
770
771
772    def get_required_packages(self, debug: bool=False) -> List[str]:
773        """
774        Return the required package names (excluding plugins).
775        """
776        _deps = self.get_dependencies(debug=debug)
777        return [_d for _d in _deps if not _d.startswith('plugin:')]
778
779
780    def activate_venv(
781        self,
782        dependencies: bool = True,
783        init_if_not_exists: bool = True,
784        debug: bool = False,
785        **kw
786    ) -> bool:
787        """
788        Activate the virtual environments for the plugin and its dependencies.
789
790        Parameters
791        ----------
792        dependencies: bool, default True
793            If `True`, activate the virtual environments for required plugins.
794
795        Returns
796        -------
797        A bool indicating success.
798        """
799        from meerschaum.utils.venv import venv_target_path
800        from meerschaum.utils.packages import activate_venv
801        from meerschaum.utils.misc import make_symlink, is_symlink
802        from meerschaum.config._paths import PACKAGE_ROOT_PATH
803
804        if dependencies:
805            for plugin in self.get_required_plugins(debug=debug):
806                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
807
808        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
809        venv_meerschaum_path = vtp / 'meerschaum'
810
811        try:
812            success, msg = True, "Success"
813            if is_symlink(venv_meerschaum_path):
814                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH:
815                    venv_meerschaum_path.unlink()
816                    success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH)
817        except Exception as e:
818            success, msg = False, str(e)
819        if not success:
820            warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}")
821
822        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
823
824
825    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
826        """
827        Deactivate the virtual environments for the plugin and its dependencies.
828
829        Parameters
830        ----------
831        dependencies: bool, default True
832            If `True`, deactivate the virtual environments for required plugins.
833
834        Returns
835        -------
836        A bool indicating success.
837        """
838        from meerschaum.utils.packages import deactivate_venv
839        success = deactivate_venv(self.name, debug=debug, **kw)
840        if dependencies:
841            for plugin in self.get_required_plugins(debug=debug):
842                plugin.deactivate_venv(debug=debug, **kw)
843        return success
844
845
846    def install_dependencies(
847        self,
848        force: bool = False,
849        debug: bool = False,
850    ) -> bool:
851        """
852        If specified, install dependencies.
853        
854        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
855        Meerschaum plugins from the same repository as this Plugin.
856        To install from a different repository, add the repo keys after `'@'`
857        (e.g. `'plugin:foo@api:bar'`).
858
859        Parameters
860        ----------
861        force: bool, default False
862            If `True`, continue with the installation, even if some
863            required packages fail to install.
864
865        debug: bool, default False
866            Verbosity toggle.
867
868        Returns
869        -------
870        A bool indicating success.
871        """
872        from meerschaum.utils.packages import pip_install, venv_contains_package
873        from meerschaum.utils.warnings import warn, info
874        _deps = self.get_dependencies(debug=debug)
875        if not _deps and self.requirements_file_path is None:
876            return True
877
878        plugins = self.get_required_plugins(debug=debug)
879        for _plugin in plugins:
880            if _plugin.name == self.name:
881                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
882                continue
883            _success, _msg = _plugin.repo_connector.install_plugin(
884                _plugin.name, debug=debug, force=force
885            )
886            if not _success:
887                warn(
888                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
889                    + f" for plugin '{self.name}':\n" + _msg,
890                    stack = False,
891                )
892                if not force:
893                    warn(
894                        "Try installing with the `--force` flag to continue anyway.",
895                        stack = False,
896                    )
897                    return False
898                info(
899                    "Continuing with installation despite the failure "
900                    + "(careful, things might be broken!)...",
901                    icon = False
902                )
903
904
905        ### First step: parse `requirements.txt` if it exists.
906        if self.requirements_file_path is not None:
907            if not pip_install(
908                requirements_file_path=self.requirements_file_path,
909                venv=self.name, debug=debug
910            ):
911                warn(
912                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
913                    stack = False,
914                )
915                if not force:
916                    warn(
917                        "Try installing with `--force` to continue anyway.",
918                        stack = False,
919                    )
920                    return False
921                info(
922                    "Continuing with installation despite the failure "
923                    + "(careful, things might be broken!)...",
924                    icon = False
925                )
926
927
928        ### Don't reinstall packages that are already included in required plugins.
929        packages = []
930        _packages = self.get_required_packages(debug=debug)
931        accounted_for_packages = set()
932        for package_name in _packages:
933            for plugin in plugins:
934                if venv_contains_package(package_name, plugin.name):
935                    accounted_for_packages.add(package_name)
936                    break
937        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
938
939        ### Attempt pip packages installation.
940        if packages:
941            for package in packages:
942                if not pip_install(package, venv=self.name, debug=debug):
943                    warn(
944                        f"Failed to install required package '{package}'"
945                        + f" for plugin '{self.name}'.",
946                        stack = False,
947                    )
948                    if not force:
949                        warn(
950                            "Try installing with `--force` to continue anyway.",
951                            stack = False,
952                        )
953                        return False
954                    info(
955                        "Continuing with installation despite the failure "
956                        + "(careful, things might be broken!)...",
957                        icon = False
958                    )
959        return True
960
961
962    @property
963    def full_name(self) -> str:
964        """
965        Include the repo keys with the plugin's name.
966        """
967        from meerschaum._internal.static import STATIC_CONFIG
968        sep = STATIC_CONFIG['plugins']['repo_separator']
969        return self.name + sep + str(self.repo_connector)
970
971
972    def __str__(self):
973        return self.name
974
975
976    def __repr__(self):
977        return f"Plugin('{self.name}', repo='{self.repo_connector}')"
978
979
980    def __del__(self):
981        pass

Handle packaging of Meerschaum plugins.

Plugin( name: str, version: Optional[str] = None, user_id: Optional[int] = None, required: Optional[List[str]] = None, attributes: Optional[Dict[str, Any]] = None, archive_path: Optional[pathlib.Path] = None, venv_path: Optional[pathlib.Path] = None, repo_connector: Optional[meerschaum.connectors.APIConnector] = None, repo: Union[meerschaum.connectors.APIConnector, str, NoneType] = None)
33    def __init__(
34        self,
35        name: str,
36        version: Optional[str] = None,
37        user_id: Optional[int] = None,
38        required: Optional[List[str]] = None,
39        attributes: Optional[Dict[str, Any]] = None,
40        archive_path: Optional[pathlib.Path] = None,
41        venv_path: Optional[pathlib.Path] = None,
42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
44    ):
45        from meerschaum._internal.static import STATIC_CONFIG
46        from meerschaum.config.paths import PLUGINS_ARCHIVES_RESOURCES_PATH, VIRTENV_RESOURCES_PATH
47        sep = STATIC_CONFIG['plugins']['repo_separator']
48        _repo = None
49        if sep in name:
50            try:
51                name, _repo = name.split(sep)
52            except Exception as e:
53                error(f"Invalid plugin name: '{name}'")
54        self._repo_in_name = _repo
55
56        if attributes is None:
57            attributes = {}
58        self.name = name
59        self.attributes = attributes
60        self.user_id = user_id
61        self._version = version
62        if required:
63            self._required = required
64        self.archive_path = (
65            archive_path if archive_path is not None
66            else PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
67        )
68        self.venv_path = (
69            venv_path if venv_path is not None
70            else VIRTENV_RESOURCES_PATH / self.name
71        )
72        self._repo_connector = repo_connector
73        self._repo_keys = repo
name
attributes
user_id
archive_path
venv_path
repo_connector
76    @property
77    def repo_connector(self):
78        """
79        Return the repository connector for this plugin.
80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
81        """
82        if self._repo_connector is None:
83            from meerschaum.connectors.parse import parse_repo_keys
84
85            repo_keys = self._repo_keys or self._repo_in_name
86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
87                error(
88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
89                )
90            repo_connector = parse_repo_keys(repo_keys)
91            self._repo_connector = repo_connector
92        return self._repo_connector

Return the repository connector for this plugin. NOTE: This imports the connectors module, which imports certain plugin modules.

version
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version

Return the plugin's module version is defined (__version__) if it's defined.

module
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module

Return the Python module of the underlying plugin.

requirements_file_path: Optional[pathlib.Path]
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path

If a file named requirements.txt exists, return its path.

def is_installed(self, **kw) -> bool:
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None

Check whether a plugin is correctly installed.

Returns
  • A bool indicating whether a plugin exists and is successfully imported.
def make_tar(self, debug: bool = False) -> pathlib.Path:
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path

Compress the plugin's source files into a .tar.gz archive and return the archive's path.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pathlib.Path to the archive file's path.
def install( self, skip_deps: bool = False, force: bool = False, debug: bool = False) -> Tuple[bool, str]:
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288        from meerschaum.utils.warnings import warn, error
289        if debug:
290            from meerschaum.utils.debug import dprint
291        import tarfile
292        import re
293        import ast
294        from meerschaum.plugins import sync_plugins_symlinks
295        from meerschaum.utils.packages import attempt_import, determine_version, reload_meerschaum
296        from meerschaum.utils.venv import init_venv
297        from meerschaum.utils.misc import safely_extract_tar
298        from meerschaum.config.paths import PLUGINS_TEMP_RESOURCES_PATH, PLUGINS_DIR_PATHS
299        old_cwd = os.getcwd()
300        old_version = ''
301        new_version = ''
302        temp_dir = PLUGINS_TEMP_RESOURCES_PATH / self.name
303        temp_dir.mkdir(exist_ok=True)
304
305        if not self.archive_path.exists():
306            return False, f"Missing archive file for plugin '{self}'."
307        if self.version is not None:
308            old_version = self.version
309            if debug:
310                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
311
312        if debug:
313            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
314
315        try:
316            with tarfile.open(self.archive_path, 'r:gz') as tarf:
317                safely_extract_tar(tarf, temp_dir)
318        except Exception as e:
319            warn(e)
320            return False, f"Failed to extract plugin '{self.name}'."
321
322        ### search for version information
323        files = os.listdir(temp_dir)
324        
325        if str(files[0]) == self.name:
326            is_dir = True
327        elif str(files[0]) == self.name + '.py':
328            is_dir = False
329        else:
330            error(f"Unknown format encountered for plugin '{self}'.")
331
332        fpath = temp_dir / files[0]
333        if is_dir:
334            fpath = fpath / '__init__.py'
335
336        init_venv(self.name, debug=debug)
337        with open(fpath, 'r', encoding='utf-8') as f:
338            init_lines = f.readlines()
339        new_version = None
340        for line in init_lines:
341            if '__version__' not in line:
342                continue
343            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
344            if not version_match:
345                continue
346            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
347            break
348        if not new_version:
349            warn(
350                f"No `__version__` defined for plugin '{self}'. "
351                + "Assuming new version...",
352                stack = False,
353            )
354
355        packaging_version = attempt_import('packaging.version')
356        try:
357            is_new_version = (not new_version and not old_version) or (
358                packaging_version.parse(old_version) < packaging_version.parse(new_version)
359            )
360            is_same_version = new_version and old_version and (
361                packaging_version.parse(old_version) == packaging_version.parse(new_version)
362            )
363        except Exception:
364            is_new_version, is_same_version = True, False
365
366        ### Determine where to permanently store the new plugin.
367        plugin_installation_dir_path = PLUGINS_DIR_PATHS[0]
368        for path in PLUGINS_DIR_PATHS:
369            if not path.exists():
370                warn(f"Plugins path does not exist: {path}", stack=False)
371                continue
372
373            files_in_plugins_dir = os.listdir(path)
374            if (
375                self.name in files_in_plugins_dir
376                or
377                (self.name + '.py') in files_in_plugins_dir
378            ):
379                plugin_installation_dir_path = path
380                break
381
382        success_msg = (
383            f"Successfully installed plugin '{self}'"
384            + ("\n    (skipped dependencies)" if skip_deps else "")
385            + "."
386        )
387        success, abort = None, None
388
389        if is_same_version and not force:
390            success, msg = True, (
391                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
392                "    Install again with `-f` or `--force` to reinstall."
393            )
394            abort = True
395        elif is_new_version or force:
396            for src_dir, dirs, files in os.walk(temp_dir):
397                if success is not None:
398                    break
399                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
400                if not os.path.exists(dst_dir):
401                    os.mkdir(dst_dir)
402                for f in files:
403                    src_file = os.path.join(src_dir, f)
404                    dst_file = os.path.join(dst_dir, f)
405                    if os.path.exists(dst_file):
406                        os.remove(dst_file)
407
408                    if debug:
409                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
410                    try:
411                        shutil.move(src_file, dst_dir)
412                    except Exception:
413                        success, msg = False, (
414                            f"Failed to install plugin '{self}': " +
415                            f"Could not move file '{src_file}' to '{dst_dir}'"
416                        )
417                        print(msg)
418                        break
419            if success is None:
420                success, msg = True, success_msg
421        else:
422            success, msg = False, (
423                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
424                + f"attempted version {new_version}."
425            )
426
427        shutil.rmtree(temp_dir)
428        os.chdir(old_cwd)
429
430        ### Reload the plugin's module.
431        sync_plugins_symlinks(debug=debug)
432        if '_module' in self.__dict__:
433            del self.__dict__['_module']
434        init_venv(venv=self.name, force=True, debug=debug)
435        reload_meerschaum(debug=debug)
436
437        ### if we've already failed, return here
438        if not success or abort:
439            _ongoing_installations.remove(self.full_name)
440            return success, msg
441
442        ### attempt to install dependencies
443        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
444        if not dependencies_installed:
445            _ongoing_installations.remove(self.full_name)
446            return False, f"Failed to install dependencies for plugin '{self}'."
447
448        ### handling success tuple, bool, or other (typically None)
449        setup_tuple = self.setup(debug=debug)
450        if isinstance(setup_tuple, tuple):
451            if not setup_tuple[0]:
452                success, msg = setup_tuple
453        elif isinstance(setup_tuple, bool):
454            if not setup_tuple:
455                success, msg = False, (
456                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
457                    f"Check `setup()` in '{self.__file__}' for more information " +
458                    "(no error message provided)."
459                )
460            else:
461                success, msg = True, success_msg
462        elif setup_tuple is None:
463            success = True
464            msg = (
465                f"Post-install for plugin '{self}' returned None. " +
466                "Assuming plugin successfully installed."
467            )
468            warn(msg)
469        else:
470            success = False
471            msg = (
472                f"Post-install for plugin '{self}' returned unexpected value " +
473                f"of type '{type(setup_tuple)}': {setup_tuple}"
474            )
475
476        _ongoing_installations.remove(self.full_name)
477        _ = self.module
478        return success, msg

Extract a plugin's tar archive to the plugins directory.

This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.

Parameters
  • skip_deps (bool, default False): If True, do not install dependencies.
  • force (bool, default False): If True, continue with installation, even if required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
def remove_archive(self, debug: bool = False) -> Tuple[bool, str]:
481    def remove_archive(
482        self,        
483        debug: bool = False
484    ) -> SuccessTuple:
485        """Remove a plugin's archive file."""
486        if not self.archive_path.exists():
487            return True, f"Archive file for plugin '{self}' does not exist."
488        try:
489            self.archive_path.unlink()
490        except Exception as e:
491            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
492        return True, "Success"

Remove a plugin's archive file.

def remove_venv(self, debug: bool = False) -> Tuple[bool, str]:
495    def remove_venv(
496        self,        
497        debug: bool = False
498    ) -> SuccessTuple:
499        """Remove a plugin's virtual environment."""
500        if not self.venv_path.exists():
501            return True, f"Virtual environment for plugin '{self}' does not exist."
502        try:
503            shutil.rmtree(self.venv_path)
504        except Exception as e:
505            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
506        return True, "Success"

Remove a plugin's virtual environment.

def uninstall(self, debug: bool = False) -> Tuple[bool, str]:
509    def uninstall(self, debug: bool = False) -> SuccessTuple:
510        """
511        Remove a plugin, its virtual environment, and archive file.
512        """
513        from meerschaum.utils.packages import reload_meerschaum
514        from meerschaum.plugins import sync_plugins_symlinks
515        from meerschaum.utils.warnings import warn, info
516        warnings_thrown_count: int = 0
517        max_warnings: int = 3
518
519        if not self.is_installed():
520            info(
521                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
522                + "Checking for artifacts...",
523                stack = False,
524            )
525        else:
526            real_path = pathlib.Path(os.path.realpath(self.__file__))
527            try:
528                if real_path.name == '__init__.py':
529                    shutil.rmtree(real_path.parent)
530                else:
531                    real_path.unlink()
532            except Exception as e:
533                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
534                warnings_thrown_count += 1
535            else:
536                info(f"Removed source files for plugin '{self.name}'.")
537
538        if self.venv_path.exists():
539            success, msg = self.remove_venv(debug=debug)
540            if not success:
541                warn(msg, stack=False)
542                warnings_thrown_count += 1
543            else:
544                info(f"Removed virtual environment from plugin '{self.name}'.")
545
546        success = warnings_thrown_count < max_warnings
547        sync_plugins_symlinks(debug=debug)
548        self.deactivate_venv(force=True, debug=debug)
549        reload_meerschaum(debug=debug)
550        return success, (
551            f"Successfully uninstalled plugin '{self}'." if success
552            else f"Failed to uninstall plugin '{self}'."
553        )

Remove a plugin, its virtual environment, and archive file.

def setup( self, *args: str, debug: bool = False, **kw: Any) -> Union[Tuple[bool, str], bool]:
556    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
557        """
558        If exists, run the plugin's `setup()` function.
559
560        Parameters
561        ----------
562        *args: str
563            The positional arguments passed to the `setup()` function.
564            
565        debug: bool, default False
566            Verbosity toggle.
567
568        **kw: Any
569            The keyword arguments passed to the `setup()` function.
570
571        Returns
572        -------
573        A `SuccessTuple` or `bool` indicating success.
574
575        """
576        from meerschaum.utils.debug import dprint
577        import inspect
578        _setup = None
579        for name, fp in inspect.getmembers(self.module):
580            if name == 'setup' and inspect.isfunction(fp):
581                _setup = fp
582                break
583
584        ### assume success if no setup() is found (not necessary)
585        if _setup is None:
586            return True
587
588        sig = inspect.signature(_setup)
589        has_debug, has_kw = ('debug' in sig.parameters), False
590        for k, v in sig.parameters.items():
591            if '**' in str(v):
592                has_kw = True
593                break
594
595        _kw = {}
596        if has_kw:
597            _kw.update(kw)
598        if has_debug:
599            _kw['debug'] = debug
600
601        if debug:
602            dprint(f"Running setup for plugin '{self}'...")
603        try:
604            self.activate_venv(debug=debug)
605            return_tuple = _setup(*args, **_kw)
606            self.deactivate_venv(debug=debug)
607        except Exception as e:
608            return False, str(e)
609
610        if isinstance(return_tuple, tuple):
611            return return_tuple
612        if isinstance(return_tuple, bool):
613            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
614        if return_tuple is None:
615            return False, f"Setup for Plugin '{self.name}' returned None."
616        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"

If exists, run the plugin's setup() function.

Parameters
  • *args (str): The positional arguments passed to the setup() function.
  • debug (bool, default False): Verbosity toggle.
  • **kw (Any): The keyword arguments passed to the setup() function.
Returns
def get_dependencies(self, debug: bool = False) -> List[str]:
619    def get_dependencies(
620        self,
621        debug: bool = False,
622    ) -> List[str]:
623        """
624        If the Plugin has specified dependencies in a list called `required`, return the list.
625        
626        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
627        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
628
629        Parameters
630        ----------
631        debug: bool, default False
632            Verbosity toggle.
633
634        Returns
635        -------
636        A list of required packages and plugins (str).
637
638        """
639        if '_required' in self.__dict__:
640            return self._required
641
642        ### If the plugin has not yet been imported,
643        ### infer the dependencies from the source text.
644        ### This is not super robust, and it doesn't feel right
645        ### having multiple versions of the logic.
646        ### This is necessary when determining the activation order
647        ### without having import the module.
648        ### For consistency's sake, the module-less method does not cache the requirements.
649        if self.__dict__.get('_module', None) is None:
650            file_path = self.__file__
651            if file_path is None:
652                return []
653            with open(file_path, 'r', encoding='utf-8') as f:
654                text = f.read()
655
656            if 'required' not in text:
657                return []
658
659            ### This has some limitations:
660            ### It relies on `required` being manually declared.
661            ### We lose the ability to dynamically alter the `required` list,
662            ### which is why we've kept the module-reliant method below.
663            import ast, re
664            ### NOTE: This technically would break 
665            ### if `required` was the very first line of the file.
666            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
667            if not req_start_match:
668                return []
669            req_start = req_start_match.start()
670            equals_sign = req_start + text[req_start:].find('=')
671
672            ### Dependencies may have brackets within the strings, so push back the index.
673            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
674            if first_opening_brace == -1:
675                return []
676
677            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
678            if next_closing_brace == -1:
679                return []
680
681            start_ix = first_opening_brace + 1
682            end_ix = next_closing_brace
683
684            num_braces = 0
685            while True:
686                if '[' not in text[start_ix:end_ix]:
687                    break
688                num_braces += 1
689                start_ix = end_ix
690                end_ix += text[end_ix + 1:].find(']') + 1
691
692            req_end = end_ix + 1
693            req_text = (
694                text[(first_opening_brace-1):req_end]
695                .lstrip()
696                .replace('=', '', 1)
697                .lstrip()
698                .rstrip()
699            )
700            try:
701                required = ast.literal_eval(req_text)
702            except Exception as e:
703                warn(
704                    f"Unable to determine requirements for plugin '{self.name}' "
705                    + "without importing the module.\n"
706                    + "    This may be due to dynamically setting the global `required` list.\n"
707                    + f"    {e}"
708                )
709                return []
710            return required
711
712        import inspect
713        self.activate_venv(dependencies=False, debug=debug)
714        required = []
715        for name, val in inspect.getmembers(self.module):
716            if name == 'required':
717                required = val
718                break
719        self._required = required
720        self.deactivate_venv(dependencies=False, debug=debug)
721        return required

If the Plugin has specified dependencies in a list called required, return the list.

NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages. Meerschaum plugins may also specify connector keys for a repo after '@'.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of required packages and plugins (str).
def get_required_plugins(self, debug: bool = False) -> List[Plugin]:
724    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
725        """
726        Return a list of required Plugin objects.
727        """
728        from meerschaum.utils.warnings import warn
729        from meerschaum.config import get_config
730        from meerschaum._internal.static import STATIC_CONFIG
731        from meerschaum.connectors.parse import is_valid_connector_keys
732        plugins = []
733        _deps = self.get_dependencies(debug=debug)
734        sep = STATIC_CONFIG['plugins']['repo_separator']
735        plugin_names = [
736            _d[len('plugin:'):] for _d in _deps
737            if _d.startswith('plugin:') and len(_d) > len('plugin:')
738        ]
739        default_repo_keys = get_config('meerschaum', 'repository')
740        skipped_repo_keys = set()
741
742        for _plugin_name in plugin_names:
743            if sep in _plugin_name:
744                try:
745                    _plugin_name, _repo_keys = _plugin_name.split(sep)
746                except Exception:
747                    _repo_keys = default_repo_keys
748                    warn(
749                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
750                        + f"Will try to use '{_repo_keys}' instead.",
751                        stack = False,
752                    )
753            else:
754                _repo_keys = default_repo_keys
755
756            if _repo_keys in skipped_repo_keys:
757                continue
758
759            if not is_valid_connector_keys(_repo_keys):
760                warn(
761                    f"Invalid connector '{_repo_keys}'.\n"
762                    f"    Skipping required plugins from repository '{_repo_keys}'",
763                    stack=False,
764                )
765                continue
766
767            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
768
769        return plugins

Return a list of required Plugin objects.

def get_required_packages(self, debug: bool = False) -> List[str]:
772    def get_required_packages(self, debug: bool=False) -> List[str]:
773        """
774        Return the required package names (excluding plugins).
775        """
776        _deps = self.get_dependencies(debug=debug)
777        return [_d for _d in _deps if not _d.startswith('plugin:')]

Return the required package names (excluding plugins).

def activate_venv( self, dependencies: bool = True, init_if_not_exists: bool = True, debug: bool = False, **kw) -> bool:
780    def activate_venv(
781        self,
782        dependencies: bool = True,
783        init_if_not_exists: bool = True,
784        debug: bool = False,
785        **kw
786    ) -> bool:
787        """
788        Activate the virtual environments for the plugin and its dependencies.
789
790        Parameters
791        ----------
792        dependencies: bool, default True
793            If `True`, activate the virtual environments for required plugins.
794
795        Returns
796        -------
797        A bool indicating success.
798        """
799        from meerschaum.utils.venv import venv_target_path
800        from meerschaum.utils.packages import activate_venv
801        from meerschaum.utils.misc import make_symlink, is_symlink
802        from meerschaum.config._paths import PACKAGE_ROOT_PATH
803
804        if dependencies:
805            for plugin in self.get_required_plugins(debug=debug):
806                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
807
808        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
809        venv_meerschaum_path = vtp / 'meerschaum'
810
811        try:
812            success, msg = True, "Success"
813            if is_symlink(venv_meerschaum_path):
814                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != PACKAGE_ROOT_PATH:
815                    venv_meerschaum_path.unlink()
816                    success, msg = make_symlink(venv_meerschaum_path, PACKAGE_ROOT_PATH)
817        except Exception as e:
818            success, msg = False, str(e)
819        if not success:
820            warn(f"Unable to create symlink {venv_meerschaum_path} to {PACKAGE_ROOT_PATH}:\n{msg}")
821
822        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)

Activate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, activate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def deactivate_venv(self, dependencies: bool = True, debug: bool = False, **kw) -> bool:
825    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
826        """
827        Deactivate the virtual environments for the plugin and its dependencies.
828
829        Parameters
830        ----------
831        dependencies: bool, default True
832            If `True`, deactivate the virtual environments for required plugins.
833
834        Returns
835        -------
836        A bool indicating success.
837        """
838        from meerschaum.utils.packages import deactivate_venv
839        success = deactivate_venv(self.name, debug=debug, **kw)
840        if dependencies:
841            for plugin in self.get_required_plugins(debug=debug):
842                plugin.deactivate_venv(debug=debug, **kw)
843        return success

Deactivate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, deactivate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def install_dependencies(self, force: bool = False, debug: bool = False) -> bool:
846    def install_dependencies(
847        self,
848        force: bool = False,
849        debug: bool = False,
850    ) -> bool:
851        """
852        If specified, install dependencies.
853        
854        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
855        Meerschaum plugins from the same repository as this Plugin.
856        To install from a different repository, add the repo keys after `'@'`
857        (e.g. `'plugin:foo@api:bar'`).
858
859        Parameters
860        ----------
861        force: bool, default False
862            If `True`, continue with the installation, even if some
863            required packages fail to install.
864
865        debug: bool, default False
866            Verbosity toggle.
867
868        Returns
869        -------
870        A bool indicating success.
871        """
872        from meerschaum.utils.packages import pip_install, venv_contains_package
873        from meerschaum.utils.warnings import warn, info
874        _deps = self.get_dependencies(debug=debug)
875        if not _deps and self.requirements_file_path is None:
876            return True
877
878        plugins = self.get_required_plugins(debug=debug)
879        for _plugin in plugins:
880            if _plugin.name == self.name:
881                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
882                continue
883            _success, _msg = _plugin.repo_connector.install_plugin(
884                _plugin.name, debug=debug, force=force
885            )
886            if not _success:
887                warn(
888                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
889                    + f" for plugin '{self.name}':\n" + _msg,
890                    stack = False,
891                )
892                if not force:
893                    warn(
894                        "Try installing with the `--force` flag to continue anyway.",
895                        stack = False,
896                    )
897                    return False
898                info(
899                    "Continuing with installation despite the failure "
900                    + "(careful, things might be broken!)...",
901                    icon = False
902                )
903
904
905        ### First step: parse `requirements.txt` if it exists.
906        if self.requirements_file_path is not None:
907            if not pip_install(
908                requirements_file_path=self.requirements_file_path,
909                venv=self.name, debug=debug
910            ):
911                warn(
912                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
913                    stack = False,
914                )
915                if not force:
916                    warn(
917                        "Try installing with `--force` to continue anyway.",
918                        stack = False,
919                    )
920                    return False
921                info(
922                    "Continuing with installation despite the failure "
923                    + "(careful, things might be broken!)...",
924                    icon = False
925                )
926
927
928        ### Don't reinstall packages that are already included in required plugins.
929        packages = []
930        _packages = self.get_required_packages(debug=debug)
931        accounted_for_packages = set()
932        for package_name in _packages:
933            for plugin in plugins:
934                if venv_contains_package(package_name, plugin.name):
935                    accounted_for_packages.add(package_name)
936                    break
937        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
938
939        ### Attempt pip packages installation.
940        if packages:
941            for package in packages:
942                if not pip_install(package, venv=self.name, debug=debug):
943                    warn(
944                        f"Failed to install required package '{package}'"
945                        + f" for plugin '{self.name}'.",
946                        stack = False,
947                    )
948                    if not force:
949                        warn(
950                            "Try installing with `--force` to continue anyway.",
951                            stack = False,
952                        )
953                        return False
954                    info(
955                        "Continuing with installation despite the failure "
956                        + "(careful, things might be broken!)...",
957                        icon = False
958                    )
959        return True

If specified, install dependencies.

NOTE: Dependencies that start with 'plugin:' will be installed as Meerschaum plugins from the same repository as this Plugin. To install from a different repository, add the repo keys after '@' (e.g. 'plugin:foo@api:bar').

Parameters
  • force (bool, default False): If True, continue with the installation, even if some required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool indicating success.
full_name: str
962    @property
963    def full_name(self) -> str:
964        """
965        Include the repo keys with the plugin's name.
966        """
967        from meerschaum._internal.static import STATIC_CONFIG
968        sep = STATIC_CONFIG['plugins']['repo_separator']
969        return self.name + sep + str(self.repo_connector)

Include the repo keys with the plugin's name.

SuccessTuple = typing.Tuple[bool, str]
class Venv:
 19class Venv:
 20    """
 21    Manage a virtual enviroment's activation status.
 22
 23    Examples
 24    --------
 25    >>> from meerschaum.plugins import Plugin
 26    >>> with Venv('mrsm') as venv:
 27    ...     import pandas
 28    >>> with Venv(Plugin('noaa')) as venv:
 29    ...     import requests
 30    >>> venv = Venv('mrsm')
 31    >>> venv.activate()
 32    True
 33    >>> venv.deactivate()
 34    True
 35    >>> 
 36    """
 37
 38    def __init__(
 39        self,
 40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
 41        init_if_not_exists: bool = True,
 42        debug: bool = False,
 43    ) -> None:
 44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
 45        ### For some weird threading issue,
 46        ### we can't use `isinstance` here.
 47        if '_Plugin' in str(type(venv)):
 48            self._venv = venv.name
 49            self._activate = venv.activate_venv
 50            self._deactivate = venv.deactivate_venv
 51            self._kwargs = {}
 52        else:
 53            self._venv = venv
 54            self._activate = activate_venv
 55            self._deactivate = deactivate_venv
 56            self._kwargs = {'venv': venv}
 57        self._debug = debug
 58        self._init_if_not_exists = init_if_not_exists
 59        ### In case someone calls `deactivate()` before `activate()`.
 60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 61
 62
 63    def activate(self, debug: bool = False) -> bool:
 64        """
 65        Activate this virtual environment.
 66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 67        will also be activated.
 68        """
 69        from meerschaum.utils.venv import active_venvs, init_venv
 70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 71        try:
 72            return self._activate(
 73                debug=(debug or self._debug),
 74                init_if_not_exists=self._init_if_not_exists,
 75                **self._kwargs
 76            )
 77        except OSError as e:
 78            if self._init_if_not_exists:
 79                if not init_venv(self._venv, force=True):
 80                    raise e
 81        return self._activate(
 82            debug=(debug or self._debug),
 83            init_if_not_exists=self._init_if_not_exists,
 84            **self._kwargs
 85        )
 86
 87
 88    def deactivate(self, debug: bool = False) -> bool:
 89        """
 90        Deactivate this virtual environment.
 91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 92        will also be deactivated.
 93        """
 94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)
 95
 96
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
106
107
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        from meerschaum.config._paths import VIRTENV_RESOURCES_PATH
114        if self._venv is None:
115            return self.target_path.parent
116        return VIRTENV_RESOURCES_PATH / self._venv
117
118
119    def __enter__(self) -> None:
120        self.activate(debug=self._debug)
121
122
123    def __exit__(self, exc_type, exc_value, exc_traceback) -> None:
124        self.deactivate(debug=self._debug)
125
126
127    def __str__(self) -> str:
128        quote = "'" if self._venv is not None else ""
129        return "Venv(" + quote + str(self._venv) + quote + ")"
130
131
132    def __repr__(self) -> str:
133        return self.__str__()

Manage a virtual enviroment's activation status.

Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
...     import pandas
>>> with Venv(Plugin('noaa')) as venv:
...     import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
Venv( venv: Union[str, Plugin, NoneType] = 'mrsm', init_if_not_exists: bool = True, debug: bool = False)
38    def __init__(
39        self,
40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
41        init_if_not_exists: bool = True,
42        debug: bool = False,
43    ) -> None:
44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
45        ### For some weird threading issue,
46        ### we can't use `isinstance` here.
47        if '_Plugin' in str(type(venv)):
48            self._venv = venv.name
49            self._activate = venv.activate_venv
50            self._deactivate = venv.deactivate_venv
51            self._kwargs = {}
52        else:
53            self._venv = venv
54            self._activate = activate_venv
55            self._deactivate = deactivate_venv
56            self._kwargs = {'venv': venv}
57        self._debug = debug
58        self._init_if_not_exists = init_if_not_exists
59        ### In case someone calls `deactivate()` before `activate()`.
60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
def activate(self, debug: bool = False) -> bool:
63    def activate(self, debug: bool = False) -> bool:
64        """
65        Activate this virtual environment.
66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
67        will also be activated.
68        """
69        from meerschaum.utils.venv import active_venvs, init_venv
70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
71        try:
72            return self._activate(
73                debug=(debug or self._debug),
74                init_if_not_exists=self._init_if_not_exists,
75                **self._kwargs
76            )
77        except OSError as e:
78            if self._init_if_not_exists:
79                if not init_venv(self._venv, force=True):
80                    raise e
81        return self._activate(
82            debug=(debug or self._debug),
83            init_if_not_exists=self._init_if_not_exists,
84            **self._kwargs
85        )

Activate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be activated.

def deactivate(self, debug: bool = False) -> bool:
88    def deactivate(self, debug: bool = False) -> bool:
89        """
90        Deactivate this virtual environment.
91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
92        will also be deactivated.
93        """
94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)

Deactivate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be deactivated.

target_path: pathlib.Path
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)

Return the target site-packages path for this virtual environment. A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version (e.g. Python 3.10 and Python 3.7).

root_path: pathlib.Path
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        from meerschaum.config._paths import VIRTENV_RESOURCES_PATH
114        if self._venv is None:
115            return self.target_path.parent
116        return VIRTENV_RESOURCES_PATH / self._venv

Return the top-level path for this virtual environment.

class Job:
  70class Job:
  71    """
  72    Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API.
  73    """
  74
  75    def __init__(
  76        self,
  77        name: str,
  78        sysargs: Union[List[str], str, None] = None,
  79        env: Optional[Dict[str, str]] = None,
  80        executor_keys: Optional[str] = None,
  81        delete_after_completion: bool = False,
  82        refresh_seconds: Union[int, float, None] = None,
  83        _properties: Optional[Dict[str, Any]] = None,
  84        _rotating_log=None,
  85        _stdin_file=None,
  86        _status_hook: Optional[Callable[[], str]] = None,
  87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
  88        _externally_managed: bool = False,
  89    ):
  90        """
  91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
  92
  93        Parameters
  94        ----------
  95        name: str
  96            The name of the job to be created.
  97            This will also be used as the Daemon ID.
  98
  99        sysargs: Union[List[str], str, None], default None
 100            The sysargs of the command to be executed, e.g. 'start api'.
 101
 102        env: Optional[Dict[str, str]], default None
 103            If provided, set these environment variables in the job's process.
 104
 105        executor_keys: Optional[str], default None
 106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
 107
 108        delete_after_completion: bool, default False
 109            If `True`, delete this job when it has finished executing.
 110
 111        refresh_seconds: Union[int, float, None], default None
 112            The number of seconds to sleep between refreshes.
 113            Defaults to the configured value `system.cli.refresh_seconds`.
 114
 115        _properties: Optional[Dict[str, Any]], default None
 116            If provided, use this to patch the daemon's properties.
 117        """
 118        from meerschaum.utils.daemon import Daemon
 119        for char in BANNED_CHARS:
 120            if char in name:
 121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
 122
 123        if isinstance(sysargs, str):
 124            sysargs = shlex.split(sysargs)
 125
 126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
 127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
 128        if sysargs:
 129            sysargs = [
 130                (arg if arg != escaped_and_key else and_key)
 131                for arg in sysargs
 132            ]
 133
 134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
 135        if executor_keys is None:
 136            from meerschaum.jobs import get_executor_keys_from_context
 137            executor_keys = get_executor_keys_from_context()
 138
 139        self.executor_keys = executor_keys
 140        self.name = name
 141        self.refresh_seconds = (
 142            refresh_seconds
 143            if refresh_seconds is not None
 144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
 145        )
 146        try:
 147            self._daemon = (
 148                Daemon(daemon_id=name)
 149                if executor_keys == 'local'
 150                else None
 151            )
 152        except Exception:
 153            self._daemon = None
 154
 155        ### Handle any injected dependencies.
 156        if _rotating_log is not None:
 157            self._rotating_log = _rotating_log
 158            if self._daemon is not None:
 159                self._daemon._rotating_log = _rotating_log
 160
 161        if _stdin_file is not None:
 162            self._stdin_file = _stdin_file
 163            if self._daemon is not None:
 164                self._daemon._stdin_file = _stdin_file
 165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
 166
 167        if _status_hook is not None:
 168            self._status_hook = _status_hook
 169
 170        if _result_hook is not None:
 171            self._result_hook = _result_hook
 172
 173        self._externally_managed = _externally_managed
 174        self._properties_patch = _properties or {}
 175        if _externally_managed:
 176            self._properties_patch.update({'externally_managed': _externally_managed})
 177
 178        if env:
 179            self._properties_patch.update({'env': env})
 180
 181        if delete_after_completion:
 182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
 183
 184        daemon_sysargs = (
 185            self._daemon.properties.get('target', {}).get('args', [None])[0]
 186            if self._daemon is not None
 187            else None
 188        )
 189
 190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
 191            warn("Given sysargs differ from existing sysargs.")
 192
 193        self._sysargs = [
 194            arg
 195            for arg in (daemon_sysargs or sysargs or [])
 196            if arg not in ('-d', '--daemon')
 197        ]
 198        for restart_flag in RESTART_FLAGS:
 199            if restart_flag in self._sysargs:
 200                self._properties_patch.update({'restart': True})
 201                break
 202
 203    @staticmethod
 204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
 205        """
 206        Build a `Job` from the PID of a running Meerschaum process.
 207
 208        Parameters
 209        ----------
 210        pid: int
 211            The PID of the process.
 212
 213        executor_keys: Optional[str], default None
 214            The executor keys to assign to the job.
 215        """
 216        from meerschaum.config.paths import DAEMON_RESOURCES_PATH
 217
 218        psutil = mrsm.attempt_import('psutil')
 219        try:
 220            process = psutil.Process(pid)
 221        except psutil.NoSuchProcess as e:
 222            warn(f"Process with PID {pid} does not exist.", stack=False)
 223            raise e
 224
 225        command_args = process.cmdline()
 226        is_daemon = command_args[1] == '-c'
 227
 228        if is_daemon:
 229            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
 230            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
 231            if root_dir is None:
 232                from meerschaum.config.paths import ROOT_DIR_PATH
 233                root_dir = ROOT_DIR_PATH
 234            else:
 235                root_dir = pathlib.Path(root_dir)
 236            jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name
 237            daemon_dir = jobs_dir / daemon_id
 238            pid_file = daemon_dir / 'process.pid'
 239
 240            if pid_file.exists():
 241                with open(pid_file, 'r', encoding='utf-8') as f:
 242                    daemon_pid = int(f.read())
 243
 244                if pid != daemon_pid:
 245                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
 246            else:
 247                raise EnvironmentError(f"Is job '{daemon_id}' running?")
 248
 249            return Job(daemon_id, executor_keys=executor_keys)
 250
 251        from meerschaum._internal.arguments._parse_arguments import parse_arguments
 252        from meerschaum.utils.daemon import get_new_daemon_name
 253
 254        mrsm_ix = 0
 255        for i, arg in enumerate(command_args):
 256            if 'mrsm' in arg or 'meerschaum' in arg.lower():
 257                mrsm_ix = i
 258                break
 259
 260        sysargs = command_args[mrsm_ix+1:]
 261        kwargs = parse_arguments(sysargs)
 262        name = kwargs.get('name', get_new_daemon_name())
 263        return Job(name, sysargs, executor_keys=executor_keys)
 264
 265    def start(self, debug: bool = False) -> SuccessTuple:
 266        """
 267        Start the job's daemon.
 268        """
 269        if self.executor is not None:
 270            if not self.exists(debug=debug):
 271                return self.executor.create_job(
 272                    self.name,
 273                    self.sysargs,
 274                    properties=self.daemon.properties,
 275                    debug=debug,
 276                )
 277            return self.executor.start_job(self.name, debug=debug)
 278
 279        if self.is_running():
 280            return True, f"{self} is already running."
 281
 282        success, msg = self.daemon.run(
 283            keep_daemon_output=(not self.delete_after_completion),
 284            allow_dirty_run=True,
 285        )
 286        if not success:
 287            return success, msg
 288
 289        return success, f"Started {self}."
 290
 291    def stop(
 292        self,
 293        timeout_seconds: Union[int, float, None] = None,
 294        debug: bool = False,
 295    ) -> SuccessTuple:
 296        """
 297        Stop the job's daemon.
 298        """
 299        if self.executor is not None:
 300            return self.executor.stop_job(self.name, debug=debug)
 301
 302        if self.daemon.status == 'stopped':
 303            if not self.restart:
 304                return True, f"{self} is not running."
 305            elif self.stop_time is not None:
 306                return True, f"{self} will not restart until manually started."
 307
 308        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
 309        if quit_success:
 310            return quit_success, f"Stopped {self}."
 311
 312        warn(
 313            f"Failed to gracefully quit {self}.",
 314            stack=False,
 315        )
 316        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
 317        if not kill_success:
 318            return kill_success, kill_msg
 319
 320        return kill_success, f"Killed {self}."
 321
 322    def pause(
 323        self,
 324        timeout_seconds: Union[int, float, None] = None,
 325        debug: bool = False,
 326    ) -> SuccessTuple:
 327        """
 328        Pause the job's daemon.
 329        """
 330        if self.executor is not None:
 331            return self.executor.pause_job(self.name, debug=debug)
 332
 333        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
 334        if not pause_success:
 335            return pause_success, pause_msg
 336
 337        return pause_success, f"Paused {self}."
 338
 339    def delete(self, debug: bool = False) -> SuccessTuple:
 340        """
 341        Delete the job and its daemon.
 342        """
 343        if self.executor is not None:
 344            return self.executor.delete_job(self.name, debug=debug)
 345
 346        if self.is_running():
 347            stop_success, stop_msg = self.stop()
 348            if not stop_success:
 349                return stop_success, stop_msg
 350
 351        cleanup_success, cleanup_msg = self.daemon.cleanup()
 352        if not cleanup_success:
 353            return cleanup_success, cleanup_msg
 354
 355        _ = self.daemon._properties.pop('result', None)
 356        return cleanup_success, f"Deleted {self}."
 357
 358    def is_running(self) -> bool:
 359        """
 360        Determine whether the job's daemon is running.
 361        """
 362        return self.status == 'running'
 363
 364    def exists(self, debug: bool = False) -> bool:
 365        """
 366        Determine whether the job exists.
 367        """
 368        if self.executor is not None:
 369            return self.executor.get_job_exists(self.name, debug=debug)
 370
 371        return self.daemon.path.exists()
 372
 373    def get_logs(self) -> Union[str, None]:
 374        """
 375        Return the output text of the job's daemon.
 376        """
 377        if self.executor is not None:
 378            return self.executor.get_logs(self.name)
 379
 380        return self.daemon.log_text
 381
 382    def monitor_logs(
 383        self,
 384        callback_function: Callable[[str], None] = _default_stdout_callback,
 385        input_callback_function: Optional[Callable[[], str]] = None,
 386        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 387        stop_event: Optional[asyncio.Event] = None,
 388        stop_on_exit: bool = False,
 389        strip_timestamps: bool = False,
 390        accept_input: bool = True,
 391        debug: bool = False,
 392        _logs_path: Optional[pathlib.Path] = None,
 393        _log=None,
 394        _stdin_file=None,
 395        _wait_if_stopped: bool = True,
 396    ):
 397        """
 398        Monitor the job's log files and execute a callback on new lines.
 399
 400        Parameters
 401        ----------
 402        callback_function: Callable[[str], None], default partial(print, end='')
 403            The callback to execute as new data comes in.
 404            Defaults to printing the output directly to `stdout`.
 405
 406        input_callback_function: Optional[Callable[[], str]], default None
 407            If provided, execute this callback when the daemon is blocking on stdin.
 408            Defaults to `sys.stdin.readline()`.
 409
 410        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 411            If provided, execute this callback when the daemon stops.
 412            The job's SuccessTuple will be passed to the callback.
 413
 414        stop_event: Optional[asyncio.Event], default None
 415            If provided, stop monitoring when this event is set.
 416            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 417            from within `callback_function` to stop monitoring.
 418
 419        stop_on_exit: bool, default False
 420            If `True`, stop monitoring when the job stops.
 421
 422        strip_timestamps: bool, default False
 423            If `True`, remove leading timestamps from lines.
 424
 425        accept_input: bool, default True
 426            If `True`, accept input when the daemon blocks on stdin.
 427        """
 428        if self.executor is not None:
 429            self.executor.monitor_logs(
 430                self.name,
 431                callback_function,
 432                input_callback_function=input_callback_function,
 433                stop_callback_function=stop_callback_function,
 434                stop_on_exit=stop_on_exit,
 435                accept_input=accept_input,
 436                strip_timestamps=strip_timestamps,
 437                debug=debug,
 438            )
 439            return
 440
 441        monitor_logs_coroutine = self.monitor_logs_async(
 442            callback_function=callback_function,
 443            input_callback_function=input_callback_function,
 444            stop_callback_function=stop_callback_function,
 445            stop_event=stop_event,
 446            stop_on_exit=stop_on_exit,
 447            strip_timestamps=strip_timestamps,
 448            accept_input=accept_input,
 449            debug=debug,
 450            _logs_path=_logs_path,
 451            _log=_log,
 452            _stdin_file=_stdin_file,
 453            _wait_if_stopped=_wait_if_stopped,
 454        )
 455        return asyncio.run(monitor_logs_coroutine)
 456
 457    async def monitor_logs_async(
 458        self,
 459        callback_function: Callable[[str], None] = _default_stdout_callback,
 460        input_callback_function: Optional[Callable[[], str]] = None,
 461        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 462        stop_event: Optional[asyncio.Event] = None,
 463        stop_on_exit: bool = False,
 464        strip_timestamps: bool = False,
 465        accept_input: bool = True,
 466        debug: bool = False,
 467        _logs_path: Optional[pathlib.Path] = None,
 468        _log=None,
 469        _stdin_file=None,
 470        _wait_if_stopped: bool = True,
 471    ):
 472        """
 473        Monitor the job's log files and await a callback on new lines.
 474
 475        Parameters
 476        ----------
 477        callback_function: Callable[[str], None], default _default_stdout_callback
 478            The callback to execute as new data comes in.
 479            Defaults to printing the output directly to `stdout`.
 480
 481        input_callback_function: Optional[Callable[[], str]], default None
 482            If provided, execute this callback when the daemon is blocking on stdin.
 483            Defaults to `sys.stdin.readline()`.
 484
 485        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 486            If provided, execute this callback when the daemon stops.
 487            The job's SuccessTuple will be passed to the callback.
 488
 489        stop_event: Optional[asyncio.Event], default None
 490            If provided, stop monitoring when this event is set.
 491            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 492            from within `callback_function` to stop monitoring.
 493
 494        stop_on_exit: bool, default False
 495            If `True`, stop monitoring when the job stops.
 496
 497        strip_timestamps: bool, default False
 498            If `True`, remove leading timestamps from lines.
 499
 500        accept_input: bool, default True
 501            If `True`, accept input when the daemon blocks on stdin.
 502        """
 503        from meerschaum.utils.prompt import prompt
 504
 505        def default_input_callback_function():
 506            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
 507            if prompt_kwargs:
 508                answer = prompt(**prompt_kwargs)
 509                return answer + '\n'
 510            return sys.stdin.readline()
 511
 512        if input_callback_function is None:
 513            input_callback_function = default_input_callback_function
 514
 515        if self.executor is not None:
 516            await self.executor.monitor_logs_async(
 517                self.name,
 518                callback_function,
 519                input_callback_function=input_callback_function,
 520                stop_callback_function=stop_callback_function,
 521                stop_on_exit=stop_on_exit,
 522                strip_timestamps=strip_timestamps,
 523                accept_input=accept_input,
 524                debug=debug,
 525            )
 526            return
 527
 528        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
 529
 530        events = {
 531            'user': stop_event,
 532            'stopped': asyncio.Event(),
 533            'stop_token': asyncio.Event(),
 534            'stop_exception': asyncio.Event(),
 535            'stopped_timeout': asyncio.Event(),
 536        }
 537        combined_event = asyncio.Event()
 538        emitted_text = False
 539        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
 540
 541        async def check_job_status():
 542            if not stop_on_exit:
 543                return
 544
 545            nonlocal emitted_text
 546
 547            sleep_time = 0.1
 548            while sleep_time < 0.2:
 549                if self.status == 'stopped':
 550                    if not emitted_text and _wait_if_stopped:
 551                        await asyncio.sleep(sleep_time)
 552                        sleep_time = round(sleep_time * 1.1, 3)
 553                        continue
 554
 555                    if stop_callback_function is not None:
 556                        try:
 557                            if asyncio.iscoroutinefunction(stop_callback_function):
 558                                await stop_callback_function(self.result)
 559                            else:
 560                                stop_callback_function(self.result)
 561                        except asyncio.exceptions.CancelledError:
 562                            break
 563                        except Exception:
 564                            warn(traceback.format_exc())
 565
 566                    if stop_on_exit:
 567                        events['stopped'].set()
 568
 569                    break
 570                await asyncio.sleep(0.1)
 571
 572            events['stopped_timeout'].set()
 573
 574        async def check_blocking_on_input():
 575            while True:
 576                if not emitted_text or not self.is_blocking_on_stdin():
 577                    try:
 578                        await asyncio.sleep(self.refresh_seconds)
 579                    except asyncio.exceptions.CancelledError:
 580                        break
 581                    continue
 582
 583                if not self.is_running():
 584                    break
 585
 586                await emit_latest_lines()
 587
 588                try:
 589                    print('', end='', flush=True)
 590                    if asyncio.iscoroutinefunction(input_callback_function):
 591                        data = await input_callback_function()
 592                    else:
 593                        loop = asyncio.get_running_loop()
 594                        data = await loop.run_in_executor(None, input_callback_function)
 595                except KeyboardInterrupt:
 596                    break
 597                #  if not data.endswith('\n'):
 598                    #  data += '\n'
 599
 600                stdin_file.write(data)
 601                await asyncio.sleep(self.refresh_seconds)
 602
 603        async def combine_events():
 604            event_tasks = [
 605                asyncio.create_task(event.wait())
 606                for event in events.values()
 607                if event is not None
 608            ]
 609            if not event_tasks:
 610                return
 611
 612            try:
 613                done, pending = await asyncio.wait(
 614                    event_tasks,
 615                    return_when=asyncio.FIRST_COMPLETED,
 616                )
 617                for task in pending:
 618                    task.cancel()
 619            except asyncio.exceptions.CancelledError:
 620                pass
 621            finally:
 622                combined_event.set()
 623
 624        check_job_status_task = asyncio.create_task(check_job_status())
 625        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
 626        combine_events_task = asyncio.create_task(combine_events())
 627
 628        log = _log if _log is not None else self.daemon.rotating_log
 629        lines_to_show = (
 630            self.daemon.properties.get(
 631                'logs', {}
 632            ).get(
 633                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
 634            )
 635        )
 636
 637        async def emit_latest_lines():
 638            nonlocal emitted_text
 639            nonlocal stop_event
 640            lines = log.readlines()
 641            for line in lines[(-1 * lines_to_show):]:
 642                if stop_event is not None and stop_event.is_set():
 643                    return
 644
 645                line_stripped_extra = strip_timestamp_from_line(line.strip())
 646                line_stripped = strip_timestamp_from_line(line)
 647
 648                if line_stripped_extra == STOP_TOKEN:
 649                    events['stop_token'].set()
 650                    return
 651
 652                if line_stripped_extra == CLEAR_TOKEN:
 653                    clear_screen(debug=debug)
 654                    continue
 655
 656                if line_stripped_extra == FLUSH_TOKEN.strip():
 657                    line_stripped = ''
 658                    line = ''
 659
 660                if strip_timestamps:
 661                    line = line_stripped
 662
 663                try:
 664                    if asyncio.iscoroutinefunction(callback_function):
 665                        await callback_function(line)
 666                    else:
 667                        callback_function(line)
 668                    emitted_text = True
 669                except StopMonitoringLogs:
 670                    events['stop_exception'].set()
 671                    return
 672                except Exception:
 673                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
 674
 675        await emit_latest_lines()
 676
 677        tasks = (
 678            [check_job_status_task]
 679            + ([check_blocking_on_input_task] if accept_input else [])
 680            + [combine_events_task]
 681        )
 682        try:
 683            _ = asyncio.gather(*tasks, return_exceptions=True)
 684        except asyncio.exceptions.CancelledError:
 685            raise
 686        except Exception:
 687            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
 688
 689        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
 690        dir_path_to_monitor = (
 691            _logs_path
 692            or (log.file_path.parent if log else None)
 693            or LOGS_RESOURCES_PATH
 694        )
 695        async for changes in watchfiles.awatch(
 696            dir_path_to_monitor,
 697            stop_event=combined_event,
 698        ):
 699            for change in changes:
 700                file_path_str = change[1]
 701                file_path = pathlib.Path(file_path_str)
 702                latest_subfile_path = log.get_latest_subfile_path()
 703                if latest_subfile_path != file_path:
 704                    continue
 705
 706                await emit_latest_lines()
 707
 708        await emit_latest_lines()
 709
 710    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
 711        """
 712        Return whether a job's daemon is blocking on stdin.
 713        """
 714        if self.executor is not None:
 715            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
 716
 717        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
 718
 719    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
 720        """
 721        Return the kwargs to the blocking `prompt()`, if available.
 722        """
 723        if self.executor is not None:
 724            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
 725
 726        if not self.daemon.prompt_kwargs_file_path.exists():
 727            return {}
 728
 729        try:
 730            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
 731                prompt_kwargs = json.load(f)
 732
 733            return prompt_kwargs
 734        
 735        except Exception:
 736            import traceback
 737            traceback.print_exc()
 738            return {}
 739
 740    def write_stdin(self, data):
 741        """
 742        Write to a job's daemon's `stdin`.
 743        """
 744        self.daemon.stdin_file.write(data)
 745
 746    @property
 747    def executor(self) -> Union[Executor, None]:
 748        """
 749        If the job is remote, return the connector to the remote API instance.
 750        """
 751        return (
 752            mrsm.get_connector(self.executor_keys)
 753            if self.executor_keys != 'local'
 754            else None
 755        )
 756
 757    @property
 758    def status(self) -> str:
 759        """
 760        Return the running status of the job's daemon.
 761        """
 762        if '_status_hook' in self.__dict__:
 763            return self._status_hook()
 764
 765        if self.executor is not None:
 766            return self.executor.get_job_status(self.name)
 767
 768        return self.daemon.status
 769
 770    @property
 771    def pid(self) -> Union[int, None]:
 772        """
 773        Return the PID of the job's dameon.
 774        """
 775        if self.executor is not None:
 776            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
 777
 778        return self.daemon.pid
 779
 780    @property
 781    def restart(self) -> bool:
 782        """
 783        Return whether to restart a stopped job.
 784        """
 785        if self.executor is not None:
 786            return self.executor.get_job_metadata(self.name).get('restart', False)
 787
 788        return self.daemon.properties.get('restart', False)
 789
 790    @property
 791    def result(self) -> SuccessTuple:
 792        """
 793        Return the `SuccessTuple` when the job has terminated.
 794        """
 795        if self.is_running():
 796            return True, f"{self} is running."
 797
 798        if '_result_hook' in self.__dict__:
 799            return self._result_hook()
 800
 801        if self.executor is not None:
 802            return (
 803                self.executor.get_job_metadata(self.name)
 804                .get('result', (False, "No result available."))
 805            )
 806
 807        _result = self.daemon.properties.get('result', None)
 808        if _result is None:
 809            from meerschaum.utils.daemon.Daemon import _results
 810            return _results.get(self.daemon.daemon_id, (False, "No result available."))
 811
 812        return tuple(_result)
 813
 814    @property
 815    def sysargs(self) -> List[str]:
 816        """
 817        Return the sysargs to use for the Daemon.
 818        """
 819        if self._sysargs:
 820            return self._sysargs
 821
 822        if self.executor is not None:
 823            return self.executor.get_job_metadata(self.name).get('sysargs', [])
 824
 825        target_args = self.daemon.target_args
 826        if target_args is None:
 827            return []
 828        self._sysargs = target_args[0] if len(target_args) > 0 else []
 829        return self._sysargs
 830
 831    def get_daemon_properties(self) -> Dict[str, Any]:
 832        """
 833        Return the `properties` dictionary for the job's daemon.
 834        """
 835        remote_properties = (
 836            {}
 837            if self.executor is None
 838            else self.executor.get_job_properties(self.name)
 839        )
 840        return {
 841            **remote_properties,
 842            **self._properties_patch
 843        }
 844
 845    @property
 846    def daemon(self) -> 'Daemon':
 847        """
 848        Return the daemon which this job manages.
 849        """
 850        from meerschaum.utils.daemon import Daemon
 851        if self._daemon is not None and self.executor is None and self._sysargs:
 852            return self._daemon
 853
 854        self._daemon = Daemon(
 855            target=entry,
 856            target_args=[self._sysargs],
 857            target_kw={},
 858            daemon_id=self.name,
 859            label=shlex.join(self._sysargs),
 860            properties=self.get_daemon_properties(),
 861        )
 862        if '_rotating_log' in self.__dict__:
 863            self._daemon._rotating_log = self._rotating_log
 864
 865        if '_stdin_file' in self.__dict__:
 866            self._daemon._stdin_file = self._stdin_file
 867            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
 868
 869        return self._daemon
 870
 871    @property
 872    def began(self) -> Union[datetime, None]:
 873        """
 874        The datetime when the job began running.
 875        """
 876        if self.executor is not None:
 877            began_str = self.executor.get_job_began(self.name)
 878            if began_str is None:
 879                return None
 880            return (
 881                datetime.fromisoformat(began_str)
 882                .astimezone(timezone.utc)
 883                .replace(tzinfo=None)
 884            )
 885
 886        began_str = self.daemon.properties.get('process', {}).get('began', None)
 887        if began_str is None:
 888            return None
 889
 890        return datetime.fromisoformat(began_str)
 891
 892    @property
 893    def ended(self) -> Union[datetime, None]:
 894        """
 895        The datetime when the job stopped running.
 896        """
 897        if self.executor is not None:
 898            ended_str = self.executor.get_job_ended(self.name)
 899            if ended_str is None:
 900                return None
 901            return (
 902                datetime.fromisoformat(ended_str)
 903                .astimezone(timezone.utc)
 904                .replace(tzinfo=None)
 905            )
 906
 907        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
 908        if ended_str is None:
 909            return None
 910
 911        return datetime.fromisoformat(ended_str)
 912
 913    @property
 914    def paused(self) -> Union[datetime, None]:
 915        """
 916        The datetime when the job was suspended while running.
 917        """
 918        if self.executor is not None:
 919            paused_str = self.executor.get_job_paused(self.name)
 920            if paused_str is None:
 921                return None
 922            return (
 923                datetime.fromisoformat(paused_str)
 924                .astimezone(timezone.utc)
 925                .replace(tzinfo=None)
 926            )
 927
 928        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
 929        if paused_str is None:
 930            return None
 931
 932        return datetime.fromisoformat(paused_str)
 933
 934    @property
 935    def stop_time(self) -> Union[datetime, None]:
 936        """
 937        Return the timestamp when the job was manually stopped.
 938        """
 939        if self.executor is not None:
 940            return self.executor.get_job_stop_time(self.name)
 941
 942        if not self.daemon.stop_path.exists():
 943            return None
 944
 945        stop_data = self.daemon._read_stop_file()
 946        if not stop_data:
 947            return None
 948
 949        stop_time_str = stop_data.get('stop_time', None)
 950        if not stop_time_str:
 951            warn(f"Could not read stop time for {self}.")
 952            return None
 953
 954        return datetime.fromisoformat(stop_time_str)
 955
 956    @property
 957    def hidden(self) -> bool:
 958        """
 959        Return a bool indicating whether this job should be displayed.
 960        """
 961        return (
 962            self.name.startswith('_')
 963            or self.name.startswith('.')
 964            or self._is_externally_managed
 965        )
 966
 967    def check_restart(self) -> SuccessTuple:
 968        """
 969        If `restart` is `True` and the daemon is not running,
 970        restart the job.
 971        Do not restart if the job was manually stopped.
 972        """
 973        if self.is_running():
 974            return True, f"{self} is running."
 975
 976        if not self.restart:
 977            return True, f"{self} does not need to be restarted."
 978
 979        if self.stop_time is not None:
 980            return True, f"{self} was manually stopped."
 981
 982        return self.start()
 983
 984    @property
 985    def label(self) -> str:
 986        """
 987        Return the job's Daemon label (joined sysargs).
 988        """
 989        from meerschaum._internal.arguments import compress_pipeline_sysargs
 990        sysargs = compress_pipeline_sysargs(self.sysargs)
 991        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
 992
 993    @property
 994    def _externally_managed_file(self) -> pathlib.Path:
 995        """
 996        Return the path to the externally managed file.
 997        """
 998        return self.daemon.path / '.externally-managed'
 999
1000    def _set_externally_managed(self):
1001        """
1002        Set this job as externally managed.
1003        """
1004        self._externally_managed = True
1005        try:
1006            self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True)
1007            self._externally_managed_file.touch()
1008        except Exception as e:
1009            warn(e)
1010
1011    @property
1012    def _is_externally_managed(self) -> bool:
1013        """
1014        Return whether this job is externally managed.
1015        """
1016        return self.executor_keys in (None, 'local') and (
1017            self._externally_managed or self._externally_managed_file.exists()
1018        )
1019
1020    @property
1021    def env(self) -> Dict[str, str]:
1022        """
1023        Return the environment variables to set for the job's process.
1024        """
1025        if '_env' in self.__dict__:
1026            return self.__dict__['_env']
1027
1028        _env = self.daemon.properties.get('env', {})
1029        default_env = {
1030            'PYTHONUNBUFFERED': '1',
1031            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1032            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1033            STATIC_CONFIG['environment']['noninteractive']: 'true',
1034        }
1035        self._env = {**default_env, **_env}
1036        return self._env
1037
1038    @property
1039    def delete_after_completion(self) -> bool:
1040        """
1041        Return whether this job is configured to delete itself after completion.
1042        """
1043        if '_delete_after_completion' in self.__dict__:
1044            return self.__dict__.get('_delete_after_completion', False)
1045
1046        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1047        return self._delete_after_completion
1048
1049    def __str__(self) -> str:
1050        sysargs = self.sysargs
1051        sysargs_str = shlex.join(sysargs) if sysargs else ''
1052        job_str = f'Job("{self.name}"'
1053        if sysargs_str:
1054            job_str += f', "{sysargs_str}"'
1055
1056        job_str += ')'
1057        return job_str
1058
1059    def __repr__(self) -> str:
1060        return str(self)
1061
1062    def __hash__(self) -> int:
1063        return hash(self.name)

Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.

Job( name: str, sysargs: Union[List[str], str, NoneType] = None, env: Optional[Dict[str, str]] = None, executor_keys: Optional[str] = None, delete_after_completion: bool = False, refresh_seconds: Union[int, float, NoneType] = None, _properties: Optional[Dict[str, Any]] = None, _rotating_log=None, _stdin_file=None, _status_hook: Optional[Callable[[], str]] = None, _result_hook: Optional[Callable[[], Tuple[bool, str]]] = None, _externally_managed: bool = False)
 75    def __init__(
 76        self,
 77        name: str,
 78        sysargs: Union[List[str], str, None] = None,
 79        env: Optional[Dict[str, str]] = None,
 80        executor_keys: Optional[str] = None,
 81        delete_after_completion: bool = False,
 82        refresh_seconds: Union[int, float, None] = None,
 83        _properties: Optional[Dict[str, Any]] = None,
 84        _rotating_log=None,
 85        _stdin_file=None,
 86        _status_hook: Optional[Callable[[], str]] = None,
 87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
 88        _externally_managed: bool = False,
 89    ):
 90        """
 91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
 92
 93        Parameters
 94        ----------
 95        name: str
 96            The name of the job to be created.
 97            This will also be used as the Daemon ID.
 98
 99        sysargs: Union[List[str], str, None], default None
100            The sysargs of the command to be executed, e.g. 'start api'.
101
102        env: Optional[Dict[str, str]], default None
103            If provided, set these environment variables in the job's process.
104
105        executor_keys: Optional[str], default None
106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
107
108        delete_after_completion: bool, default False
109            If `True`, delete this job when it has finished executing.
110
111        refresh_seconds: Union[int, float, None], default None
112            The number of seconds to sleep between refreshes.
113            Defaults to the configured value `system.cli.refresh_seconds`.
114
115        _properties: Optional[Dict[str, Any]], default None
116            If provided, use this to patch the daemon's properties.
117        """
118        from meerschaum.utils.daemon import Daemon
119        for char in BANNED_CHARS:
120            if char in name:
121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
122
123        if isinstance(sysargs, str):
124            sysargs = shlex.split(sysargs)
125
126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
128        if sysargs:
129            sysargs = [
130                (arg if arg != escaped_and_key else and_key)
131                for arg in sysargs
132            ]
133
134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
135        if executor_keys is None:
136            from meerschaum.jobs import get_executor_keys_from_context
137            executor_keys = get_executor_keys_from_context()
138
139        self.executor_keys = executor_keys
140        self.name = name
141        self.refresh_seconds = (
142            refresh_seconds
143            if refresh_seconds is not None
144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
145        )
146        try:
147            self._daemon = (
148                Daemon(daemon_id=name)
149                if executor_keys == 'local'
150                else None
151            )
152        except Exception:
153            self._daemon = None
154
155        ### Handle any injected dependencies.
156        if _rotating_log is not None:
157            self._rotating_log = _rotating_log
158            if self._daemon is not None:
159                self._daemon._rotating_log = _rotating_log
160
161        if _stdin_file is not None:
162            self._stdin_file = _stdin_file
163            if self._daemon is not None:
164                self._daemon._stdin_file = _stdin_file
165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
166
167        if _status_hook is not None:
168            self._status_hook = _status_hook
169
170        if _result_hook is not None:
171            self._result_hook = _result_hook
172
173        self._externally_managed = _externally_managed
174        self._properties_patch = _properties or {}
175        if _externally_managed:
176            self._properties_patch.update({'externally_managed': _externally_managed})
177
178        if env:
179            self._properties_patch.update({'env': env})
180
181        if delete_after_completion:
182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
183
184        daemon_sysargs = (
185            self._daemon.properties.get('target', {}).get('args', [None])[0]
186            if self._daemon is not None
187            else None
188        )
189
190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
191            warn("Given sysargs differ from existing sysargs.")
192
193        self._sysargs = [
194            arg
195            for arg in (daemon_sysargs or sysargs or [])
196            if arg not in ('-d', '--daemon')
197        ]
198        for restart_flag in RESTART_FLAGS:
199            if restart_flag in self._sysargs:
200                self._properties_patch.update({'restart': True})
201                break

Create a new job to manage a meerschaum.utils.daemon.Daemon.

Parameters
  • name (str): The name of the job to be created. This will also be used as the Daemon ID.
  • sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
  • env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
  • executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
  • delete_after_completion (bool, default False): If True, delete this job when it has finished executing.
  • refresh_seconds (Union[int, float, None], default None): The number of seconds to sleep between refreshes. Defaults to the configured value system.cli.refresh_seconds.
  • _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
executor_keys
name
refresh_seconds
@staticmethod
def from_pid( pid: int, executor_keys: Optional[str] = None) -> Job:
203    @staticmethod
204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
205        """
206        Build a `Job` from the PID of a running Meerschaum process.
207
208        Parameters
209        ----------
210        pid: int
211            The PID of the process.
212
213        executor_keys: Optional[str], default None
214            The executor keys to assign to the job.
215        """
216        from meerschaum.config.paths import DAEMON_RESOURCES_PATH
217
218        psutil = mrsm.attempt_import('psutil')
219        try:
220            process = psutil.Process(pid)
221        except psutil.NoSuchProcess as e:
222            warn(f"Process with PID {pid} does not exist.", stack=False)
223            raise e
224
225        command_args = process.cmdline()
226        is_daemon = command_args[1] == '-c'
227
228        if is_daemon:
229            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
230            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
231            if root_dir is None:
232                from meerschaum.config.paths import ROOT_DIR_PATH
233                root_dir = ROOT_DIR_PATH
234            else:
235                root_dir = pathlib.Path(root_dir)
236            jobs_dir = root_dir / DAEMON_RESOURCES_PATH.name
237            daemon_dir = jobs_dir / daemon_id
238            pid_file = daemon_dir / 'process.pid'
239
240            if pid_file.exists():
241                with open(pid_file, 'r', encoding='utf-8') as f:
242                    daemon_pid = int(f.read())
243
244                if pid != daemon_pid:
245                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
246            else:
247                raise EnvironmentError(f"Is job '{daemon_id}' running?")
248
249            return Job(daemon_id, executor_keys=executor_keys)
250
251        from meerschaum._internal.arguments._parse_arguments import parse_arguments
252        from meerschaum.utils.daemon import get_new_daemon_name
253
254        mrsm_ix = 0
255        for i, arg in enumerate(command_args):
256            if 'mrsm' in arg or 'meerschaum' in arg.lower():
257                mrsm_ix = i
258                break
259
260        sysargs = command_args[mrsm_ix+1:]
261        kwargs = parse_arguments(sysargs)
262        name = kwargs.get('name', get_new_daemon_name())
263        return Job(name, sysargs, executor_keys=executor_keys)

Build a Job from the PID of a running Meerschaum process.

Parameters
  • pid (int): The PID of the process.
  • executor_keys (Optional[str], default None): The executor keys to assign to the job.
def start(self, debug: bool = False) -> Tuple[bool, str]:
265    def start(self, debug: bool = False) -> SuccessTuple:
266        """
267        Start the job's daemon.
268        """
269        if self.executor is not None:
270            if not self.exists(debug=debug):
271                return self.executor.create_job(
272                    self.name,
273                    self.sysargs,
274                    properties=self.daemon.properties,
275                    debug=debug,
276                )
277            return self.executor.start_job(self.name, debug=debug)
278
279        if self.is_running():
280            return True, f"{self} is already running."
281
282        success, msg = self.daemon.run(
283            keep_daemon_output=(not self.delete_after_completion),
284            allow_dirty_run=True,
285        )
286        if not success:
287            return success, msg
288
289        return success, f"Started {self}."

Start the job's daemon.

def stop( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
291    def stop(
292        self,
293        timeout_seconds: Union[int, float, None] = None,
294        debug: bool = False,
295    ) -> SuccessTuple:
296        """
297        Stop the job's daemon.
298        """
299        if self.executor is not None:
300            return self.executor.stop_job(self.name, debug=debug)
301
302        if self.daemon.status == 'stopped':
303            if not self.restart:
304                return True, f"{self} is not running."
305            elif self.stop_time is not None:
306                return True, f"{self} will not restart until manually started."
307
308        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
309        if quit_success:
310            return quit_success, f"Stopped {self}."
311
312        warn(
313            f"Failed to gracefully quit {self}.",
314            stack=False,
315        )
316        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
317        if not kill_success:
318            return kill_success, kill_msg
319
320        return kill_success, f"Killed {self}."

Stop the job's daemon.

def pause( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
322    def pause(
323        self,
324        timeout_seconds: Union[int, float, None] = None,
325        debug: bool = False,
326    ) -> SuccessTuple:
327        """
328        Pause the job's daemon.
329        """
330        if self.executor is not None:
331            return self.executor.pause_job(self.name, debug=debug)
332
333        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
334        if not pause_success:
335            return pause_success, pause_msg
336
337        return pause_success, f"Paused {self}."

Pause the job's daemon.

def delete(self, debug: bool = False) -> Tuple[bool, str]:
339    def delete(self, debug: bool = False) -> SuccessTuple:
340        """
341        Delete the job and its daemon.
342        """
343        if self.executor is not None:
344            return self.executor.delete_job(self.name, debug=debug)
345
346        if self.is_running():
347            stop_success, stop_msg = self.stop()
348            if not stop_success:
349                return stop_success, stop_msg
350
351        cleanup_success, cleanup_msg = self.daemon.cleanup()
352        if not cleanup_success:
353            return cleanup_success, cleanup_msg
354
355        _ = self.daemon._properties.pop('result', None)
356        return cleanup_success, f"Deleted {self}."

Delete the job and its daemon.

def is_running(self) -> bool:
358    def is_running(self) -> bool:
359        """
360        Determine whether the job's daemon is running.
361        """
362        return self.status == 'running'

Determine whether the job's daemon is running.

def exists(self, debug: bool = False) -> bool:
364    def exists(self, debug: bool = False) -> bool:
365        """
366        Determine whether the job exists.
367        """
368        if self.executor is not None:
369            return self.executor.get_job_exists(self.name, debug=debug)
370
371        return self.daemon.path.exists()

Determine whether the job exists.

def get_logs(self) -> Optional[str]:
373    def get_logs(self) -> Union[str, None]:
374        """
375        Return the output text of the job's daemon.
376        """
377        if self.executor is not None:
378            return self.executor.get_logs(self.name)
379
380        return self.daemon.log_text

Return the output text of the job's daemon.

def monitor_logs( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
382    def monitor_logs(
383        self,
384        callback_function: Callable[[str], None] = _default_stdout_callback,
385        input_callback_function: Optional[Callable[[], str]] = None,
386        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
387        stop_event: Optional[asyncio.Event] = None,
388        stop_on_exit: bool = False,
389        strip_timestamps: bool = False,
390        accept_input: bool = True,
391        debug: bool = False,
392        _logs_path: Optional[pathlib.Path] = None,
393        _log=None,
394        _stdin_file=None,
395        _wait_if_stopped: bool = True,
396    ):
397        """
398        Monitor the job's log files and execute a callback on new lines.
399
400        Parameters
401        ----------
402        callback_function: Callable[[str], None], default partial(print, end='')
403            The callback to execute as new data comes in.
404            Defaults to printing the output directly to `stdout`.
405
406        input_callback_function: Optional[Callable[[], str]], default None
407            If provided, execute this callback when the daemon is blocking on stdin.
408            Defaults to `sys.stdin.readline()`.
409
410        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
411            If provided, execute this callback when the daemon stops.
412            The job's SuccessTuple will be passed to the callback.
413
414        stop_event: Optional[asyncio.Event], default None
415            If provided, stop monitoring when this event is set.
416            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
417            from within `callback_function` to stop monitoring.
418
419        stop_on_exit: bool, default False
420            If `True`, stop monitoring when the job stops.
421
422        strip_timestamps: bool, default False
423            If `True`, remove leading timestamps from lines.
424
425        accept_input: bool, default True
426            If `True`, accept input when the daemon blocks on stdin.
427        """
428        if self.executor is not None:
429            self.executor.monitor_logs(
430                self.name,
431                callback_function,
432                input_callback_function=input_callback_function,
433                stop_callback_function=stop_callback_function,
434                stop_on_exit=stop_on_exit,
435                accept_input=accept_input,
436                strip_timestamps=strip_timestamps,
437                debug=debug,
438            )
439            return
440
441        monitor_logs_coroutine = self.monitor_logs_async(
442            callback_function=callback_function,
443            input_callback_function=input_callback_function,
444            stop_callback_function=stop_callback_function,
445            stop_event=stop_event,
446            stop_on_exit=stop_on_exit,
447            strip_timestamps=strip_timestamps,
448            accept_input=accept_input,
449            debug=debug,
450            _logs_path=_logs_path,
451            _log=_log,
452            _stdin_file=_stdin_file,
453            _wait_if_stopped=_wait_if_stopped,
454        )
455        return asyncio.run(monitor_logs_coroutine)

Monitor the job's log files and execute a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default partial(print, end='')): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
async def monitor_logs_async( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
457    async def monitor_logs_async(
458        self,
459        callback_function: Callable[[str], None] = _default_stdout_callback,
460        input_callback_function: Optional[Callable[[], str]] = None,
461        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
462        stop_event: Optional[asyncio.Event] = None,
463        stop_on_exit: bool = False,
464        strip_timestamps: bool = False,
465        accept_input: bool = True,
466        debug: bool = False,
467        _logs_path: Optional[pathlib.Path] = None,
468        _log=None,
469        _stdin_file=None,
470        _wait_if_stopped: bool = True,
471    ):
472        """
473        Monitor the job's log files and await a callback on new lines.
474
475        Parameters
476        ----------
477        callback_function: Callable[[str], None], default _default_stdout_callback
478            The callback to execute as new data comes in.
479            Defaults to printing the output directly to `stdout`.
480
481        input_callback_function: Optional[Callable[[], str]], default None
482            If provided, execute this callback when the daemon is blocking on stdin.
483            Defaults to `sys.stdin.readline()`.
484
485        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
486            If provided, execute this callback when the daemon stops.
487            The job's SuccessTuple will be passed to the callback.
488
489        stop_event: Optional[asyncio.Event], default None
490            If provided, stop monitoring when this event is set.
491            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
492            from within `callback_function` to stop monitoring.
493
494        stop_on_exit: bool, default False
495            If `True`, stop monitoring when the job stops.
496
497        strip_timestamps: bool, default False
498            If `True`, remove leading timestamps from lines.
499
500        accept_input: bool, default True
501            If `True`, accept input when the daemon blocks on stdin.
502        """
503        from meerschaum.utils.prompt import prompt
504
505        def default_input_callback_function():
506            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
507            if prompt_kwargs:
508                answer = prompt(**prompt_kwargs)
509                return answer + '\n'
510            return sys.stdin.readline()
511
512        if input_callback_function is None:
513            input_callback_function = default_input_callback_function
514
515        if self.executor is not None:
516            await self.executor.monitor_logs_async(
517                self.name,
518                callback_function,
519                input_callback_function=input_callback_function,
520                stop_callback_function=stop_callback_function,
521                stop_on_exit=stop_on_exit,
522                strip_timestamps=strip_timestamps,
523                accept_input=accept_input,
524                debug=debug,
525            )
526            return
527
528        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
529
530        events = {
531            'user': stop_event,
532            'stopped': asyncio.Event(),
533            'stop_token': asyncio.Event(),
534            'stop_exception': asyncio.Event(),
535            'stopped_timeout': asyncio.Event(),
536        }
537        combined_event = asyncio.Event()
538        emitted_text = False
539        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
540
541        async def check_job_status():
542            if not stop_on_exit:
543                return
544
545            nonlocal emitted_text
546
547            sleep_time = 0.1
548            while sleep_time < 0.2:
549                if self.status == 'stopped':
550                    if not emitted_text and _wait_if_stopped:
551                        await asyncio.sleep(sleep_time)
552                        sleep_time = round(sleep_time * 1.1, 3)
553                        continue
554
555                    if stop_callback_function is not None:
556                        try:
557                            if asyncio.iscoroutinefunction(stop_callback_function):
558                                await stop_callback_function(self.result)
559                            else:
560                                stop_callback_function(self.result)
561                        except asyncio.exceptions.CancelledError:
562                            break
563                        except Exception:
564                            warn(traceback.format_exc())
565
566                    if stop_on_exit:
567                        events['stopped'].set()
568
569                    break
570                await asyncio.sleep(0.1)
571
572            events['stopped_timeout'].set()
573
574        async def check_blocking_on_input():
575            while True:
576                if not emitted_text or not self.is_blocking_on_stdin():
577                    try:
578                        await asyncio.sleep(self.refresh_seconds)
579                    except asyncio.exceptions.CancelledError:
580                        break
581                    continue
582
583                if not self.is_running():
584                    break
585
586                await emit_latest_lines()
587
588                try:
589                    print('', end='', flush=True)
590                    if asyncio.iscoroutinefunction(input_callback_function):
591                        data = await input_callback_function()
592                    else:
593                        loop = asyncio.get_running_loop()
594                        data = await loop.run_in_executor(None, input_callback_function)
595                except KeyboardInterrupt:
596                    break
597                #  if not data.endswith('\n'):
598                    #  data += '\n'
599
600                stdin_file.write(data)
601                await asyncio.sleep(self.refresh_seconds)
602
603        async def combine_events():
604            event_tasks = [
605                asyncio.create_task(event.wait())
606                for event in events.values()
607                if event is not None
608            ]
609            if not event_tasks:
610                return
611
612            try:
613                done, pending = await asyncio.wait(
614                    event_tasks,
615                    return_when=asyncio.FIRST_COMPLETED,
616                )
617                for task in pending:
618                    task.cancel()
619            except asyncio.exceptions.CancelledError:
620                pass
621            finally:
622                combined_event.set()
623
624        check_job_status_task = asyncio.create_task(check_job_status())
625        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
626        combine_events_task = asyncio.create_task(combine_events())
627
628        log = _log if _log is not None else self.daemon.rotating_log
629        lines_to_show = (
630            self.daemon.properties.get(
631                'logs', {}
632            ).get(
633                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
634            )
635        )
636
637        async def emit_latest_lines():
638            nonlocal emitted_text
639            nonlocal stop_event
640            lines = log.readlines()
641            for line in lines[(-1 * lines_to_show):]:
642                if stop_event is not None and stop_event.is_set():
643                    return
644
645                line_stripped_extra = strip_timestamp_from_line(line.strip())
646                line_stripped = strip_timestamp_from_line(line)
647
648                if line_stripped_extra == STOP_TOKEN:
649                    events['stop_token'].set()
650                    return
651
652                if line_stripped_extra == CLEAR_TOKEN:
653                    clear_screen(debug=debug)
654                    continue
655
656                if line_stripped_extra == FLUSH_TOKEN.strip():
657                    line_stripped = ''
658                    line = ''
659
660                if strip_timestamps:
661                    line = line_stripped
662
663                try:
664                    if asyncio.iscoroutinefunction(callback_function):
665                        await callback_function(line)
666                    else:
667                        callback_function(line)
668                    emitted_text = True
669                except StopMonitoringLogs:
670                    events['stop_exception'].set()
671                    return
672                except Exception:
673                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
674
675        await emit_latest_lines()
676
677        tasks = (
678            [check_job_status_task]
679            + ([check_blocking_on_input_task] if accept_input else [])
680            + [combine_events_task]
681        )
682        try:
683            _ = asyncio.gather(*tasks, return_exceptions=True)
684        except asyncio.exceptions.CancelledError:
685            raise
686        except Exception:
687            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
688
689        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
690        dir_path_to_monitor = (
691            _logs_path
692            or (log.file_path.parent if log else None)
693            or LOGS_RESOURCES_PATH
694        )
695        async for changes in watchfiles.awatch(
696            dir_path_to_monitor,
697            stop_event=combined_event,
698        ):
699            for change in changes:
700                file_path_str = change[1]
701                file_path = pathlib.Path(file_path_str)
702                latest_subfile_path = log.get_latest_subfile_path()
703                if latest_subfile_path != file_path:
704                    continue
705
706                await emit_latest_lines()
707
708        await emit_latest_lines()

Monitor the job's log files and await a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default _default_stdout_callback): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
def is_blocking_on_stdin(self, debug: bool = False) -> bool:
710    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
711        """
712        Return whether a job's daemon is blocking on stdin.
713        """
714        if self.executor is not None:
715            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
716
717        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()

Return whether a job's daemon is blocking on stdin.

def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
719    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
720        """
721        Return the kwargs to the blocking `prompt()`, if available.
722        """
723        if self.executor is not None:
724            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
725
726        if not self.daemon.prompt_kwargs_file_path.exists():
727            return {}
728
729        try:
730            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
731                prompt_kwargs = json.load(f)
732
733            return prompt_kwargs
734        
735        except Exception:
736            import traceback
737            traceback.print_exc()
738            return {}

Return the kwargs to the blocking prompt(), if available.

def write_stdin(self, data):
740    def write_stdin(self, data):
741        """
742        Write to a job's daemon's `stdin`.
743        """
744        self.daemon.stdin_file.write(data)

Write to a job's daemon's stdin.

executor: Optional[meerschaum.jobs.Executor]
746    @property
747    def executor(self) -> Union[Executor, None]:
748        """
749        If the job is remote, return the connector to the remote API instance.
750        """
751        return (
752            mrsm.get_connector(self.executor_keys)
753            if self.executor_keys != 'local'
754            else None
755        )

If the job is remote, return the connector to the remote API instance.

status: str
757    @property
758    def status(self) -> str:
759        """
760        Return the running status of the job's daemon.
761        """
762        if '_status_hook' in self.__dict__:
763            return self._status_hook()
764
765        if self.executor is not None:
766            return self.executor.get_job_status(self.name)
767
768        return self.daemon.status

Return the running status of the job's daemon.

pid: Optional[int]
770    @property
771    def pid(self) -> Union[int, None]:
772        """
773        Return the PID of the job's dameon.
774        """
775        if self.executor is not None:
776            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
777
778        return self.daemon.pid

Return the PID of the job's dameon.

restart: bool
780    @property
781    def restart(self) -> bool:
782        """
783        Return whether to restart a stopped job.
784        """
785        if self.executor is not None:
786            return self.executor.get_job_metadata(self.name).get('restart', False)
787
788        return self.daemon.properties.get('restart', False)

Return whether to restart a stopped job.

result: Tuple[bool, str]
790    @property
791    def result(self) -> SuccessTuple:
792        """
793        Return the `SuccessTuple` when the job has terminated.
794        """
795        if self.is_running():
796            return True, f"{self} is running."
797
798        if '_result_hook' in self.__dict__:
799            return self._result_hook()
800
801        if self.executor is not None:
802            return (
803                self.executor.get_job_metadata(self.name)
804                .get('result', (False, "No result available."))
805            )
806
807        _result = self.daemon.properties.get('result', None)
808        if _result is None:
809            from meerschaum.utils.daemon.Daemon import _results
810            return _results.get(self.daemon.daemon_id, (False, "No result available."))
811
812        return tuple(_result)

Return the SuccessTuple when the job has terminated.

sysargs: List[str]
814    @property
815    def sysargs(self) -> List[str]:
816        """
817        Return the sysargs to use for the Daemon.
818        """
819        if self._sysargs:
820            return self._sysargs
821
822        if self.executor is not None:
823            return self.executor.get_job_metadata(self.name).get('sysargs', [])
824
825        target_args = self.daemon.target_args
826        if target_args is None:
827            return []
828        self._sysargs = target_args[0] if len(target_args) > 0 else []
829        return self._sysargs

Return the sysargs to use for the Daemon.

def get_daemon_properties(self) -> Dict[str, Any]:
831    def get_daemon_properties(self) -> Dict[str, Any]:
832        """
833        Return the `properties` dictionary for the job's daemon.
834        """
835        remote_properties = (
836            {}
837            if self.executor is None
838            else self.executor.get_job_properties(self.name)
839        )
840        return {
841            **remote_properties,
842            **self._properties_patch
843        }

Return the properties dictionary for the job's daemon.

daemon: "'Daemon'"
845    @property
846    def daemon(self) -> 'Daemon':
847        """
848        Return the daemon which this job manages.
849        """
850        from meerschaum.utils.daemon import Daemon
851        if self._daemon is not None and self.executor is None and self._sysargs:
852            return self._daemon
853
854        self._daemon = Daemon(
855            target=entry,
856            target_args=[self._sysargs],
857            target_kw={},
858            daemon_id=self.name,
859            label=shlex.join(self._sysargs),
860            properties=self.get_daemon_properties(),
861        )
862        if '_rotating_log' in self.__dict__:
863            self._daemon._rotating_log = self._rotating_log
864
865        if '_stdin_file' in self.__dict__:
866            self._daemon._stdin_file = self._stdin_file
867            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
868
869        return self._daemon

Return the daemon which this job manages.

began: Optional[datetime.datetime]
871    @property
872    def began(self) -> Union[datetime, None]:
873        """
874        The datetime when the job began running.
875        """
876        if self.executor is not None:
877            began_str = self.executor.get_job_began(self.name)
878            if began_str is None:
879                return None
880            return (
881                datetime.fromisoformat(began_str)
882                .astimezone(timezone.utc)
883                .replace(tzinfo=None)
884            )
885
886        began_str = self.daemon.properties.get('process', {}).get('began', None)
887        if began_str is None:
888            return None
889
890        return datetime.fromisoformat(began_str)

The datetime when the job began running.

ended: Optional[datetime.datetime]
892    @property
893    def ended(self) -> Union[datetime, None]:
894        """
895        The datetime when the job stopped running.
896        """
897        if self.executor is not None:
898            ended_str = self.executor.get_job_ended(self.name)
899            if ended_str is None:
900                return None
901            return (
902                datetime.fromisoformat(ended_str)
903                .astimezone(timezone.utc)
904                .replace(tzinfo=None)
905            )
906
907        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
908        if ended_str is None:
909            return None
910
911        return datetime.fromisoformat(ended_str)

The datetime when the job stopped running.

paused: Optional[datetime.datetime]
913    @property
914    def paused(self) -> Union[datetime, None]:
915        """
916        The datetime when the job was suspended while running.
917        """
918        if self.executor is not None:
919            paused_str = self.executor.get_job_paused(self.name)
920            if paused_str is None:
921                return None
922            return (
923                datetime.fromisoformat(paused_str)
924                .astimezone(timezone.utc)
925                .replace(tzinfo=None)
926            )
927
928        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
929        if paused_str is None:
930            return None
931
932        return datetime.fromisoformat(paused_str)

The datetime when the job was suspended while running.

stop_time: Optional[datetime.datetime]
934    @property
935    def stop_time(self) -> Union[datetime, None]:
936        """
937        Return the timestamp when the job was manually stopped.
938        """
939        if self.executor is not None:
940            return self.executor.get_job_stop_time(self.name)
941
942        if not self.daemon.stop_path.exists():
943            return None
944
945        stop_data = self.daemon._read_stop_file()
946        if not stop_data:
947            return None
948
949        stop_time_str = stop_data.get('stop_time', None)
950        if not stop_time_str:
951            warn(f"Could not read stop time for {self}.")
952            return None
953
954        return datetime.fromisoformat(stop_time_str)

Return the timestamp when the job was manually stopped.

hidden: bool
956    @property
957    def hidden(self) -> bool:
958        """
959        Return a bool indicating whether this job should be displayed.
960        """
961        return (
962            self.name.startswith('_')
963            or self.name.startswith('.')
964            or self._is_externally_managed
965        )

Return a bool indicating whether this job should be displayed.

def check_restart(self) -> Tuple[bool, str]:
967    def check_restart(self) -> SuccessTuple:
968        """
969        If `restart` is `True` and the daemon is not running,
970        restart the job.
971        Do not restart if the job was manually stopped.
972        """
973        if self.is_running():
974            return True, f"{self} is running."
975
976        if not self.restart:
977            return True, f"{self} does not need to be restarted."
978
979        if self.stop_time is not None:
980            return True, f"{self} was manually stopped."
981
982        return self.start()

If restart is True and the daemon is not running, restart the job. Do not restart if the job was manually stopped.

label: str
984    @property
985    def label(self) -> str:
986        """
987        Return the job's Daemon label (joined sysargs).
988        """
989        from meerschaum._internal.arguments import compress_pipeline_sysargs
990        sysargs = compress_pipeline_sysargs(self.sysargs)
991        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()

Return the job's Daemon label (joined sysargs).

env: Dict[str, str]
1020    @property
1021    def env(self) -> Dict[str, str]:
1022        """
1023        Return the environment variables to set for the job's process.
1024        """
1025        if '_env' in self.__dict__:
1026            return self.__dict__['_env']
1027
1028        _env = self.daemon.properties.get('env', {})
1029        default_env = {
1030            'PYTHONUNBUFFERED': '1',
1031            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1032            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1033            STATIC_CONFIG['environment']['noninteractive']: 'true',
1034        }
1035        self._env = {**default_env, **_env}
1036        return self._env

Return the environment variables to set for the job's process.

delete_after_completion: bool
1038    @property
1039    def delete_after_completion(self) -> bool:
1040        """
1041        Return whether this job is configured to delete itself after completion.
1042        """
1043        if '_delete_after_completion' in self.__dict__:
1044            return self.__dict__.get('_delete_after_completion', False)
1045
1046        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1047        return self._delete_after_completion

Return whether this job is configured to delete itself after completion.

def pprint( *args, detect_password: bool = True, nopretty: bool = False, **kw) -> None:
 10def pprint(
 11    *args,
 12    detect_password: bool = True,
 13    nopretty: bool = False,
 14    **kw
 15) -> None:
 16    """Pretty print an object according to the configured ANSI and UNICODE settings.
 17    If detect_password is True (default), search and replace passwords with '*' characters.
 18    Does not mutate objects.
 19    """
 20    import copy
 21    import json
 22    from meerschaum.utils.packages import attempt_import, import_rich
 23    from meerschaum.utils.formatting import ANSI, get_console, print_tuple
 24    from meerschaum.utils.warnings import error
 25    from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords
 26    from collections import OrderedDict
 27
 28    if (
 29        len(args) == 1
 30        and
 31        isinstance(args[0], tuple)
 32        and
 33        len(args[0]) == 2
 34        and
 35        isinstance(args[0][0], bool)
 36        and
 37        isinstance(args[0][1], str)
 38    ):
 39        return print_tuple(args[0], **filter_keywords(print_tuple, **kw))
 40
 41    modify = True
 42    rich_pprint = None
 43    if ANSI and not nopretty:
 44        rich = import_rich()
 45        if rich is not None:
 46            rich_pretty = attempt_import('rich.pretty')
 47        if rich_pretty is not None:
 48            def _rich_pprint(*args, **kw):
 49                _console = get_console()
 50                _kw = filter_keywords(_console.print, **kw)
 51                _console.print(*args, **_kw)
 52            rich_pprint = _rich_pprint
 53    elif not nopretty:
 54        pprintpp = attempt_import('pprintpp', warn=False)
 55        try:
 56            _pprint = pprintpp.pprint
 57        except Exception :
 58            import pprint as _pprint_module
 59            _pprint = _pprint_module.pprint
 60
 61    func = (
 62        _pprint if rich_pprint is None else rich_pprint
 63    ) if not nopretty else print
 64
 65    try:
 66        args_copy = copy.deepcopy(args)
 67    except Exception:
 68        args_copy = args
 69        modify = False
 70
 71    _args = []
 72    for a in args:
 73        c = a
 74        ### convert OrderedDict into dict
 75        if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict):
 76            c = dict_from_od(copy.deepcopy(c))
 77        _args.append(c)
 78    args = _args
 79
 80    _args = list(args)
 81    if detect_password and modify:
 82        _args = []
 83        for a in args:
 84            c = a
 85            if isinstance(c, dict):
 86                c = replace_password(copy.deepcopy(c))
 87            if nopretty:
 88                try:
 89                    c = json.dumps(c)
 90                    is_json = True
 91                except Exception:
 92                    is_json = False
 93                if not is_json:
 94                    try:
 95                        c = str(c)
 96                    except Exception:
 97                        pass
 98            _args.append(c)
 99
100    ### filter out unsupported keywords
101    func_kw = filter_keywords(func, **kw) if not nopretty else {}
102    error_msg = None
103    try:
104        func(*_args, **func_kw)
105    except Exception as e:
106        error_msg = e
107    if error_msg is not None:
108        error(error_msg)

Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.

def attempt_import( *names: str, lazy: bool = True, warn: bool = True, install: bool = True, venv: Optional[str] = 'mrsm', precheck: bool = True, split: bool = True, check_update: bool = False, check_pypi: bool = False, check_is_installed: bool = True, allow_outside_venv: bool = True, color: bool = True, debug: bool = False) -> Any:
1237def attempt_import(
1238    *names: str,
1239    lazy: bool = True,
1240    warn: bool = True,
1241    install: bool = True,
1242    venv: Optional[str] = 'mrsm',
1243    precheck: bool = True,
1244    split: bool = True,
1245    check_update: bool = False,
1246    check_pypi: bool = False,
1247    check_is_installed: bool = True,
1248    allow_outside_venv: bool = True,
1249    color: bool = True,
1250    debug: bool = False
1251) -> Any:
1252    """
1253    Raise a warning if packages are not installed; otherwise import and return modules.
1254    If `lazy` is `True`, return lazy-imported modules.
1255    
1256    Returns tuple of modules if multiple names are provided, else returns one module.
1257    
1258    Parameters
1259    ----------
1260    names: List[str]
1261        The packages to be imported.
1262
1263    lazy: bool, default True
1264        If `True`, lazily load packages.
1265
1266    warn: bool, default True
1267        If `True`, raise a warning if a package cannot be imported.
1268
1269    install: bool, default True
1270        If `True`, attempt to install a missing package into the designated virtual environment.
1271        If `check_update` is True, install updates if available.
1272
1273    venv: Optional[str], default 'mrsm'
1274        The virtual environment in which to search for packages and to install packages into.
1275
1276    precheck: bool, default True
1277        If `True`, attempt to find module before importing (necessary for checking if modules exist
1278        and retaining lazy imports), otherwise assume lazy is `False`.
1279
1280    split: bool, default True
1281        If `True`, split packages' names on `'.'`.
1282
1283    check_update: bool, default False
1284        If `True` and `install` is `True`, install updates if the required minimum version
1285        does not match.
1286
1287    check_pypi: bool, default False
1288        If `True` and `check_update` is `True`, check PyPI when determining whether
1289        an update is required.
1290
1291    check_is_installed: bool, default True
1292        If `True`, check if the package is contained in the virtual environment.
1293
1294    allow_outside_venv: bool, default True
1295        If `True`, search outside of the specified virtual environment
1296        if the package cannot be found.
1297        Setting to `False` will reinstall the package into a virtual environment, even if it
1298        is installed outside.
1299
1300    color: bool, default True
1301        If `False`, do not print ANSI colors.
1302
1303    Returns
1304    -------
1305    The specified modules. If they're not available and `install` is `True`, it will first
1306    download them into a virtual environment and return the modules.
1307
1308    Examples
1309    --------
1310    >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
1311    >>> pandas = attempt_import('pandas')
1312
1313    """
1314
1315    import importlib.util
1316
1317    ### to prevent recursion, check if parent Meerschaum package is being imported
1318    if names == ('meerschaum',):
1319        return _import_module('meerschaum')
1320
1321    if venv == 'mrsm' and _import_hook_venv is not None:
1322        if debug:
1323            print(f"Import hook for virtual environment '{_import_hook_venv}' is active.")
1324        venv = _import_hook_venv
1325
1326    _warnings = _import_module('meerschaum.utils.warnings')
1327    warn_function = _warnings.warn
1328
1329    def do_import(_name: str, **kw) -> Union['ModuleType', None]:
1330        with Venv(venv=venv, debug=debug):
1331            ### determine the import method (lazy vs normal)
1332            from meerschaum.utils.misc import filter_keywords
1333            import_method = (
1334                _import_module if not lazy
1335                else lazy_import
1336            )
1337            try:
1338                mod = import_method(_name, **(filter_keywords(import_method, **kw)))
1339            except Exception as e:
1340                if warn:
1341                    import traceback
1342                    traceback.print_exception(type(e), e, e.__traceback__)
1343                    warn_function(
1344                        f"Failed to import module '{_name}'.\nException:\n{e}",
1345                        ImportWarning,
1346                        stacklevel = (5 if lazy else 4),
1347                        color = False,
1348                    )
1349                mod = None
1350        return mod
1351
1352    modules = []
1353    for name in names:
1354        ### Check if package is a declared dependency.
1355        root_name = name.split('.')[0] if split else name
1356        install_name = _import_to_install_name(root_name)
1357
1358        if install_name is None:
1359            install_name = root_name
1360            if warn and root_name != 'plugins':
1361                warn_function(
1362                    f"Package '{root_name}' is not declared in meerschaum.utils.packages.",
1363                    ImportWarning,
1364                    stacklevel = 3,
1365                    color = False
1366                )
1367
1368        ### Determine if the package exists.
1369        if precheck is False:
1370            found_module = (
1371                do_import(
1372                    name, debug=debug, warn=False, venv=venv, color=color,
1373                    check_update=False, check_pypi=False, split=split,
1374                ) is not None
1375            )
1376        else:
1377            if check_is_installed:
1378                with _locks['_is_installed_first_check']:
1379                    if not _is_installed_first_check.get(name, False):
1380                        package_is_installed = is_installed(
1381                            name,
1382                            venv = venv,
1383                            split = split,
1384                            allow_outside_venv = allow_outside_venv,
1385                            debug = debug,
1386                        )
1387                        _is_installed_first_check[name] = package_is_installed
1388                    else:
1389                        package_is_installed = _is_installed_first_check[name]
1390            else:
1391                package_is_installed = _is_installed_first_check.get(
1392                    name,
1393                    venv_contains_package(name, venv=venv, split=split, debug=debug)
1394                )
1395            found_module = package_is_installed
1396
1397        if not found_module:
1398            if install:
1399                if not pip_install(
1400                    install_name,
1401                    venv = venv,
1402                    split = False,
1403                    check_update = check_update,
1404                    color = color,
1405                    debug = debug
1406                ) and warn:
1407                    warn_function(
1408                        f"Failed to install '{install_name}'.",
1409                        ImportWarning,
1410                        stacklevel = 3,
1411                        color = False,
1412                    )
1413            elif warn:
1414                ### Raise a warning if we can't find the package and install = False.
1415                warn_function(
1416                    (f"\n\nMissing package '{name}' from virtual environment '{venv}'; "
1417                     + "some features will not work correctly."
1418                     + "\n\nSet install=True when calling attempt_import.\n"),
1419                    ImportWarning,
1420                    stacklevel = 3,
1421                    color = False,
1422                )
1423
1424        ### Do the import. Will be lazy if lazy=True.
1425        m = do_import(
1426            name, debug=debug, warn=warn, venv=venv, color=color,
1427            check_update=check_update, check_pypi=check_pypi, install=install, split=split,
1428        )
1429        modules.append(m)
1430
1431    modules = tuple(modules)
1432    if len(modules) == 1:
1433        return modules[0]
1434    return modules

Raise a warning if packages are not installed; otherwise import and return modules. If lazy is True, return lazy-imported modules.

Returns tuple of modules if multiple names are provided, else returns one module.

Parameters
  • names (List[str]): The packages to be imported.
  • lazy (bool, default True): If True, lazily load packages.
  • warn (bool, default True): If True, raise a warning if a package cannot be imported.
  • install (bool, default True): If True, attempt to install a missing package into the designated virtual environment. If check_update is True, install updates if available.
  • venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
  • precheck (bool, default True): If True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy is False.
  • split (bool, default True): If True, split packages' names on '.'.
  • check_update (bool, default False): If True and install is True, install updates if the required minimum version does not match.
  • check_pypi (bool, default False): If True and check_update is True, check PyPI when determining whether an update is required.
  • check_is_installed (bool, default True): If True, check if the package is contained in the virtual environment.
  • allow_outside_venv (bool, default True): If True, search outside of the specified virtual environment if the package cannot be found. Setting to False will reinstall the package into a virtual environment, even if it is installed outside.
  • color (bool, default True): If False, do not print ANSI colors.
Returns
  • The specified modules. If they're not available and install is True, it will first
  • download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
class Connector:
 22class Connector(metaclass=abc.ABCMeta):
 23    """
 24    The base connector class to hold connection attributes.
 25    """
 26
 27    IS_INSTANCE: bool = False
 28
 29    def __init__(
 30        self,
 31        type: Optional[str] = None,
 32        label: Optional[str] = None,
 33        **kw: Any
 34    ):
 35        """
 36        Set the given keyword arguments as attributes.
 37
 38        Parameters
 39        ----------
 40        type: str
 41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
 42
 43        label: str
 44            The `label` for the connector.
 45
 46
 47        Examples
 48        --------
 49        Run `mrsm edit config` and to edit connectors in the YAML file:
 50
 51        ```yaml
 52        meerschaum:
 53            connections:
 54                {type}:
 55                    {label}:
 56                        ### attributes go here
 57        ```
 58
 59        """
 60        self._original_dict = copy.deepcopy(self.__dict__)
 61        self._set_attributes(type=type, label=label, **kw)
 62
 63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
 64        self.verify_attributes(
 65            ['uri']
 66            if 'uri' in self.__dict__
 67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
 68        )
 69
 70    def _reset_attributes(self):
 71        self.__dict__ = self._original_dict
 72
 73    def _set_attributes(
 74        self,
 75        *args,
 76        inherit_default: bool = True,
 77        **kw: Any
 78    ):
 79        from meerschaum._internal.static import STATIC_CONFIG
 80        from meerschaum.utils.warnings import error
 81
 82        self._attributes = {}
 83
 84        default_label = STATIC_CONFIG['connectors']['default_label']
 85
 86        ### NOTE: Support the legacy method of explicitly passing the type.
 87        label = kw.get('label', None)
 88        if label is None:
 89            if len(args) == 2:
 90                label = args[1]
 91            elif len(args) == 0:
 92                label = None
 93            else:
 94                label = args[0]
 95
 96        if label == 'default':
 97            error(
 98                f"Label cannot be 'default'. Did you mean '{default_label}'?",
 99                InvalidAttributesError,
100            )
101        self.__dict__['label'] = label
102
103        from meerschaum.config import get_config
104        conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors'))
105        connector_config = copy.deepcopy(get_config('system', 'connectors'))
106
107        ### inherit attributes from 'default' if exists
108        if inherit_default:
109            inherit_from = 'default'
110            if self.type in conn_configs and inherit_from in conn_configs[self.type]:
111                _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from])
112                self._attributes.update(_inherit_dict)
113
114        ### load user config into self._attributes
115        if self.type in conn_configs and self.label in conn_configs[self.type]:
116            self._attributes.update(conn_configs[self.type][self.label] or {})
117
118        ### load system config into self._sys_config
119        ### (deep copy so future Connectors don't inherit changes)
120        if self.type in connector_config:
121            self._sys_config = copy.deepcopy(connector_config[self.type])
122
123        ### add additional arguments or override configuration
124        self._attributes.update(kw)
125
126        ### finally, update __dict__ with _attributes.
127        self.__dict__.update(self._attributes)
128
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )
175
176
177    def __str__(self):
178        """
179        When cast to a string, return type:label.
180        """
181        return f"{self.type}:{self.label}"
182
183    def __repr__(self):
184        """
185        Represent the connector as type:label.
186        """
187        return str(self)
188
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta
204
205
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type
225
226
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

The base connector class to hold connection attributes.

Connector(type: Optional[str] = None, label: Optional[str] = None, **kw: Any)
29    def __init__(
30        self,
31        type: Optional[str] = None,
32        label: Optional[str] = None,
33        **kw: Any
34    ):
35        """
36        Set the given keyword arguments as attributes.
37
38        Parameters
39        ----------
40        type: str
41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
42
43        label: str
44            The `label` for the connector.
45
46
47        Examples
48        --------
49        Run `mrsm edit config` and to edit connectors in the YAML file:
50
51        ```yaml
52        meerschaum:
53            connections:
54                {type}:
55                    {label}:
56                        ### attributes go here
57        ```
58
59        """
60        self._original_dict = copy.deepcopy(self.__dict__)
61        self._set_attributes(type=type, label=label, **kw)
62
63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
64        self.verify_attributes(
65            ['uri']
66            if 'uri' in self.__dict__
67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
68        )

Set the given keyword arguments as attributes.

Parameters
  • type (str): The type of the connector (e.g. sql, api, plugin).
  • label (str): The label for the connector.
Examples

Run mrsm edit config and to edit connectors in the YAML file:

meerschaum:
    connections:
        {type}:
            {label}:
                ### attributes go here
IS_INSTANCE: bool = False
def verify_attributes( self, required_attributes: Optional[List[str]] = None, debug: bool = False) -> None:
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )

Ensure that the required attributes have been met.

The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.

Parameters
  • required_attributes (Optional[List[str]], default None): Attributes to be verified. If None, default to ['label'].
  • debug (bool, default False): Verbosity toggle.
Returns
  • Don't return anything.
Raises
  • An error if any of the required attributes are missing.
meta: Dict[str, Any]
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta

Return the keys needed to reconstruct this Connector.

type: str
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type

Return the type for this connector.

label: str
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

Return the label for this connector.

class InstanceConnector(meerschaum.Connector):
18class InstanceConnector(Connector):
19    """
20    Instance connectors define the interface for managing pipes and provide methods
21    for management of users, plugins, tokens, and other metadata built atop pipes.
22    """
23
24    IS_INSTANCE: bool = True
25    IS_THREAD_SAFE: bool = False
26
27    from ._users import (
28        get_users_pipe,
29        register_user,
30        get_user_id,
31        get_username,
32        get_users,
33        edit_user,
34        delete_user,
35        get_user_password_hash,
36        get_user_type,
37        get_user_attributes,
38    )
39
40    from ._plugins import (
41        get_plugins_pipe,
42        register_plugin,
43        get_plugin_user_id,
44        delete_plugin,
45        get_plugin_id,
46        get_plugin_version,
47        get_plugins,
48        get_plugin_user_id,
49        get_plugin_username,
50        get_plugin_attributes,
51    )
52
53    from ._tokens import (
54        get_tokens_pipe,
55        register_token,
56        edit_token,
57        invalidate_token,
58        delete_token,
59        get_token,
60        get_tokens,
61        get_token_model,
62        get_token_secret_hash,
63        token_exists,
64        get_token_scopes,
65    )
66
67    from ._pipes import (
68        register_pipe,
69        get_pipe_attributes,
70        get_pipe_id,
71        edit_pipe,
72        delete_pipe,
73        fetch_pipes_keys,
74        pipe_exists,
75        drop_pipe,
76        drop_pipe_indices,
77        sync_pipe,
78        create_pipe_indices,
79        clear_pipe,
80        get_pipe_data,
81        get_sync_time,
82        get_pipe_columns_types,
83        get_pipe_columns_indices,
84    )

Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.

IS_INSTANCE: bool = True
IS_THREAD_SAFE: bool = False
def get_users_pipe(self) -> Pipe:
18def get_users_pipe(self) -> 'mrsm.Pipe':
19    """
20    Return the pipe used for users registration.
21    """
22    if '_users_pipe' in self.__dict__:
23        return self._users_pipe
24
25    cache_connector = self.__dict__.get('_cache_connector', None)
26    self._users_pipe = mrsm.Pipe(
27        'mrsm', 'users',
28        instance=self,
29        target='mrsm_users',
30        temporary=True,
31        cache=True,
32        cache_connector_keys=cache_connector,
33        static=True,
34        null_indices=False,
35        columns={
36            'primary': 'user_id',
37        },
38        dtypes={
39            'user_id': 'uuid',
40            'username': 'string',
41            'password_hash': 'string',
42            'email': 'string',
43            'user_type': 'string',
44            'attributes': 'json',
45        },
46        indices={
47            'unique': 'username',
48        },
49    )
50    return self._users_pipe

Return the pipe used for users registration.

def register_user( self, user: meerschaum.core.User._User.User, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
53def register_user(
54    self,
55    user: User,
56    debug: bool = False,
57    **kwargs: Any
58) -> mrsm.SuccessTuple:
59    """
60    Register a new user to the users pipe.
61    """
62    users_pipe = self.get_users_pipe()
63    user.user_id = uuid.uuid4()
64    sync_success, sync_msg = users_pipe.sync(
65        [{
66            'user_id': user.user_id,
67            'username': user.username,
68            'email': user.email,
69            'password_hash': user.password_hash,
70            'user_type': user.type,
71            'attributes': user.attributes,
72        }],
73        check_existing=False,
74        debug=debug,
75    )
76    if not sync_success:
77        return False, f"Failed to register user '{user.username}':\n{sync_msg}"
78
79    return True, "Success"

Register a new user to the users pipe.

def get_user_id( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
83    """
84    Return a user's ID from the username.
85    """
86    users_pipe = self.get_users_pipe()
87    result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1)
88    if result_df is None or len(result_df) == 0:
89        return None
90    return result_df['user_id'][0]

Return a user's ID from the username.

def get_username(self, user_id: Any, debug: bool = False) -> Any:
93def get_username(self, user_id: Any, debug: bool = False) -> Any:
94    """
95    Return the username from the given ID.
96    """
97    users_pipe = self.get_users_pipe()
98    return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)

Return the username from the given ID.

def get_users(self, debug: bool = False, **kw: Any) -> List[str]:
101def get_users(
102    self,
103    debug: bool = False,
104    **kw: Any
105) -> List[str]:
106    """
107    Get the registered usernames.
108    """
109    users_pipe = self.get_users_pipe()
110    df = users_pipe.get_data()
111    if df is None:
112        return []
113
114    return list(df['username'])

Get the registered usernames.

def edit_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
118    """
119    Edit the attributes for an existing user.
120    """
121    users_pipe = self.get_users_pipe()
122    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
123
124    doc = {'user_id': user_id}
125    if user.email != '':
126        doc['email'] = user.email
127    if user.password_hash != '':
128        doc['password_hash'] = user.password_hash
129    if user.type != '':
130        doc['user_type'] = user.type
131    if user.attributes:
132        doc['attributes'] = user.attributes
133
134    sync_success, sync_msg = users_pipe.sync([doc], debug=debug)
135    if not sync_success:
136        return False, f"Failed to edit user '{user.username}':\n{sync_msg}"
137
138    return True, "Success"

Edit the attributes for an existing user.

def delete_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
142    """
143    Delete a user from the users table.
144    """
145    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
146    users_pipe = self.get_users_pipe()
147    clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug)
148    if not clear_success:
149        return False, f"Failed to delete user '{user}':\n{clear_msg}"
150    return True, "Success"

Delete a user from the users table.

def get_user_password_hash( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
154    """
155    Get a user's password hash from the users table.
156    """
157    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
158    users_pipe = self.get_users_pipe()
159    result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug)
160    if result_df is None or len(result_df) == 0:
161        return None
162
163    return result_df['password_hash'][0]

Get a user's password hash from the users table.

def get_user_type( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[str]:
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]:
167    """
168    Get a user's type from the users table.
169    """
170    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
171    users_pipe = self.get_users_pipe()
172    result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug)
173    if result_df is None or len(result_df) == 0:
174        return None
175
176    return result_df['user_type'][0]

Get a user's type from the users table.

def get_user_attributes( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[Dict[str, Any]]:
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]:
180    """
181    Get a user's attributes from the users table.
182    """
183    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
184    users_pipe = self.get_users_pipe()
185    result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug)
186    if result_df is None or len(result_df) == 0:
187        return None
188
189    return result_df['attributes'][0]

Get a user's attributes from the users table.

def get_plugins_pipe(self) -> Pipe:
16def get_plugins_pipe(self) -> 'mrsm.Pipe':
17    """
18    Return the internal pipe for syncing plugins metadata.
19    """
20    if '_plugins_pipe' in self.__dict__:
21        return self._plugins_pipe
22
23    cache_connector = self.__dict__.get('_cache_connector', None)
24    users_pipe = self.get_users_pipe()
25    user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid')
26
27    self._plugins_pipe = mrsm.Pipe(
28        'mrsm', 'plugins',
29        instance=self,
30        target='mrsm_plugins',
31        temporary=True,
32        cache=True,
33        cache_connector_keys=cache_connector,
34        static=True,
35        null_indices=False,
36        columns={
37            'primary': 'plugin_name',
38            'user_id': 'user_id',
39        },
40        dtypes={
41            'plugin_name': 'string',
42            'user_id': user_id_dtype,
43            'attributes': 'json',
44            'version': 'string',
45        },
46    )
47    return self._plugins_pipe

Return the internal pipe for syncing plugins metadata.

def register_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
51    """
52    Register a new plugin to the plugins table.
53    """
54    plugins_pipe = self.get_plugins_pipe()
55    users_pipe = self.get_users_pipe()
56    user_id = self.get_plugin_user_id(plugin)
57    if user_id is not None:
58        username = self.get_username(user_id, debug=debug)
59        return False, f"{plugin} is already registered to '{username}'."
60
61    doc = {
62        'plugin_name': plugin.name,
63        'version': plugin.version,
64        'attributes': plugin.attributes,
65        'user_id': plugin.user_id,
66    }
67
68    sync_success, sync_msg = plugins_pipe.sync(
69        [doc],
70        check_existing=False,
71        debug=debug,
72    )
73    if not sync_success:
74        return False, f"Failed to register {plugin}:\n{sync_msg}"
75
76    return True, "Success"

Register a new plugin to the plugins table.

def get_plugin_user_id( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
80    """
81    Return the user ID for plugin's owner.
82    """
83    plugins_pipe = self.get_plugins_pipe() 
84    return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)

Return the user ID for plugin's owner.

def delete_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
106    """
107    Delete a plugin's registration.
108    """
109    plugin_id = self.get_plugin_id(plugin, debug=debug)
110    if plugin_id is None:
111        return False, f"{plugin} is not registered."
112    
113    plugins_pipe = self.get_plugins_pipe()
114    clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug)
115    if not clear_success:
116        return False, f"Failed to delete {plugin}:\n{clear_msg}"
117    return True, "Success"

Delete a plugin's registration.

def get_plugin_id( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
 97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
 98    """
 99    Return a plugin's ID.
100    """
101    user_id = self.get_plugin_user_id(plugin, debug=debug)
102    return plugin.name if user_id is not None else None

Return a plugin's ID.

def get_plugin_version( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
121    """
122    Return the version for a plugin.
123    """
124    plugins_pipe = self.get_plugins_pipe() 
125    return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)

Return the version for a plugin.

def get_plugins( self, user_id: Optional[int] = None, search_term: Optional[str] = None, debug: bool = False, **kw: Any) -> List[str]:
136def get_plugins(
137    self,
138    user_id: Optional[int] = None,
139    search_term: Optional[str] = None,
140    debug: bool = False,
141    **kw: Any
142) -> List[str]:
143    """
144    Return a list of plugin names.
145    """
146    plugins_pipe = self.get_plugins_pipe()
147    params = {}
148    if user_id:
149        params['user_id'] = user_id
150
151    df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug)
152    if df is None:
153        return []
154
155    docs = df.to_dict(orient='records')
156    return [
157        plugin_name
158        for doc in docs
159        if (plugin_name := doc['plugin_name']).startswith(search_term or '')
160    ]

Return a list of plugin names.

def get_plugin_username( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
88    """
89    Return the username for plugin's owner.
90    """
91    user_id = self.get_plugin_user_id(plugin, debug=debug)
92    if user_id is None:
93        return None
94    return self.get_username(user_id, debug=debug)

Return the username for plugin's owner.

def get_plugin_attributes( self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
129    """
130    Return the attributes for a plugin.
131    """
132    plugins_pipe = self.get_plugins_pipe() 
133    return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}

Return the attributes for a plugin.

def get_tokens_pipe(self) -> Pipe:
22def get_tokens_pipe(self) -> mrsm.Pipe:
23    """
24    Return the internal pipe for tokens management.
25    """
26    if '_tokens_pipe' in self.__dict__:
27        return self._tokens_pipe
28
29    users_pipe = self.get_users_pipe()
30    user_id_dtype = (
31        users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid')
32    )
33
34    cache_connector = self.__dict__.get('_cache_connector', None)
35
36    self._tokens_pipe = mrsm.Pipe(
37        'mrsm', 'tokens',
38        instance=self,
39        target='mrsm_tokens',
40        temporary=True,
41        cache=True,
42        cache_connector_keys=cache_connector,
43        static=True,
44        autotime=True,
45        null_indices=False,
46        columns={
47            'datetime': 'creation',
48            'primary': 'id',
49        },
50        indices={
51            'unique': 'label',
52            'user_id': 'user_id',
53        },
54        dtypes={
55            'id': 'uuid',
56            'creation': 'datetime',
57            'expiration': 'datetime',
58            'is_valid': 'bool',
59            'label': 'string',
60            'user_id': user_id_dtype,
61            'scopes': 'json',
62            'secret_hash': 'string',
63        },
64    )
65    return self._tokens_pipe

Return the internal pipe for tokens management.

def register_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
68def register_token(
69    self,
70    token: Token,
71    debug: bool = False,
72) -> mrsm.SuccessTuple:
73    """
74    Register the new token to the tokens table.
75    """
76    token_id, token_secret = token.generate_credentials()
77    tokens_pipe = self.get_tokens_pipe()
78    user_id = self.get_user_id(token.user) if token.user is not None else None
79    if user_id is None:
80        return False, "Cannot register a token without a user."
81
82    doc = {
83        'id': token_id,
84        'user_id': user_id,
85        'creation': datetime.now(timezone.utc),
86        'expiration': token.expiration,
87        'label': token.label,
88        'is_valid': token.is_valid,
89        'scopes': list(token.scopes) if token.scopes else [],
90        'secret_hash': hash_password(
91            str(token_secret),
92            rounds=STATIC_CONFIG['tokens']['hash_rounds']
93        ),
94    }
95    sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug)
96    if not sync_success:
97        return False, f"Failed to register token:\n{sync_msg}"
98    return True, "Success"

Register the new token to the tokens table.

def edit_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
102    """
103    Persist the token's in-memory state to the tokens pipe.
104    """
105    if not token.id:
106        return False, "Token ID is not set."
107
108    if not token.exists(debug=debug):
109        return False, f"Token {token.id} does not exist."
110
111    if not token.creation:
112        token_model = self.get_token_model(token.id)
113        token.creation = token_model.creation
114
115    tokens_pipe = self.get_tokens_pipe()
116    doc = {
117        'id': token.id,
118        'creation': token.creation,
119        'expiration': token.expiration,
120        'label': token.label,
121        'is_valid': token.is_valid,
122        'scopes': list(token.scopes) if token.scopes else [],
123    }
124    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
125    if not sync_success:
126        return False, f"Failed to edit token '{token.id}':\n{sync_msg}"
127
128    return True, "Success"

Persist the token's in-memory state to the tokens pipe.

def invalidate_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
132    """
133    Set `is_valid` to `False` for the given token.
134    """
135    if not token.id:
136        return False, "Token ID is not set."
137
138    if not token.exists(debug=debug):
139        return False, f"Token {token.id} does not exist."
140
141    if not token.creation:
142        token_model = self.get_token_model(token.id)
143        token.creation = token_model.creation
144
145    token.is_valid = False
146    tokens_pipe = self.get_tokens_pipe()
147    doc = {
148        'id': token.id,
149        'creation': token.creation,
150        'is_valid': False,
151    }
152    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
153    if not sync_success:
154        return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}"
155
156    return True, "Success"

Set is_valid to False for the given token.

def delete_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
160    """
161    Delete the given token from the tokens table.
162    """
163    if not token.id:
164        return False, "Token ID is not set."
165
166    if not token.exists(debug=debug):
167        return False, f"Token {token.id} does not exist."
168
169    if not token.creation:
170        token_model = self.get_token_model(token.id)
171        token.creation = token_model.creation
172
173    token.is_valid = False
174    tokens_pipe = self.get_tokens_pipe()
175    clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug)
176    if not clear_success:
177        return False, f"Failed to delete token '{token.id}':\n{clear_msg}"
178
179    return True, "Success"

Delete the given token from the tokens table.

def get_token( self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Optional[meerschaum.core.Token._Token.Token]:
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]:
236    """
237    Return the `Token` from its ID.
238    """
239    from meerschaum.utils.misc import is_uuid
240    if isinstance(token_id, str):
241        if is_uuid(token_id):
242            token_id = uuid.UUID(token_id)
243        else:
244            raise ValueError("Invalid token ID.")
245    token_model = self.get_token_model(token_id)
246    if token_model is None:
247        return None
248    return Token(**dict(token_model))

Return the Token from its ID.

def get_tokens( self, user: Optional[meerschaum.core.User._User.User] = None, labels: Optional[List[str]] = None, ids: Optional[List[uuid.UUID]] = None, debug: bool = False) -> List[meerschaum.core.Token._Token.Token]:
182def get_tokens(
183    self,
184    user: Optional[User] = None,
185    labels: Optional[List[str]] = None,
186    ids: Optional[List[uuid.UUID]] = None,
187    debug: bool = False,
188) -> List[Token]:
189    """
190    Return a list of `Token` objects.
191    """
192    tokens_pipe = self.get_tokens_pipe()
193    user_id = (
194        self.get_user_id(user, debug=debug)
195        if user is not None
196        else None
197    )
198    user_type = self.get_user_type(user, debug=debug) if user is not None else None
199    params = (
200        {
201            'user_id': (
202                user_id
203                if user_type != 'admin'
204                else [user_id, None]
205            )
206        }
207        if user_id is not None
208        else {}
209    )
210    if labels:
211        params['label'] = labels
212    if ids:
213        params['id'] = ids
214        
215    if debug:
216        dprint(f"Getting tokens with {user_id=}, {params=}")
217
218    tokens_df = tokens_pipe.get_data(params=params, debug=debug)
219    if tokens_df is None:
220        return []
221
222    if debug:
223        dprint(f"Retrieved tokens dataframe:\n{tokens_df}")
224
225    tokens_docs = tokens_df.to_dict(orient='records')
226    return [
227        Token(
228            instance=self,
229            **token_doc
230        )
231        for token_doc in reversed(tokens_docs)
232    ]

Return a list of Token objects.

def get_token_model( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> "'Union[TokenModel, None]'":
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]':
252    """
253    Return a token's model from the instance.
254    """
255    from meerschaum.models import TokenModel
256    if isinstance(token_id, Token):
257        token_id = Token.id
258    if not token_id:
259        raise ValueError("Invalid token ID.")
260    tokens_pipe = self.get_tokens_pipe()
261    doc = tokens_pipe.get_doc(
262        params={'id': token_id},
263        debug=debug,
264    )
265    if doc is None:
266        return None
267    return TokenModel(**doc)

Return a token's model from the instance.

def get_token_secret_hash( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> Optional[str]:
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]:
271    """
272    Return the secret hash for a given token.
273    """
274    if isinstance(token_id, Token):
275        token_id = token_id.id
276    if not token_id:
277        raise ValueError("Invalid token ID.")
278    tokens_pipe = self.get_tokens_pipe()
279    return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)

Return the secret hash for a given token.

def token_exists( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> bool:
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool:
309    """
310    Return `True` if a token exists in the tokens pipe.
311    """
312    if isinstance(token_id, Token):
313        token_id = token_id.id
314    if not token_id:
315        raise ValueError("Invalid token ID.")
316
317    tokens_pipe = self.get_tokens_pipe()
318    return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None

Return True if a token exists in the tokens pipe.

def get_token_scopes( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> List[str]:
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]:
296    """
297    Return the scopes for a token.
298    """
299    if isinstance(token_id, Token):
300        token_id = token_id.id
301    if not token_id:
302        raise ValueError("Invalid token ID.")
303
304    tokens_pipe = self.get_tokens_pipe()
305    return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []

Return the scopes for a token.

@abc.abstractmethod
def register_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
17@abc.abstractmethod
18def register_pipe(
19    self,
20    pipe: mrsm.Pipe,
21    debug: bool = False,
22    **kwargs: Any
23) -> mrsm.SuccessTuple:
24    """
25    Insert the pipe's attributes into the internal `pipes` table.
26
27    Parameters
28    ----------
29    pipe: mrsm.Pipe
30        The pipe to be registered.
31
32    Returns
33    -------
34    A `SuccessTuple` of the result.
35    """

Insert the pipe's attributes into the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe to be registered.
Returns
@abc.abstractmethod
def get_pipe_attributes( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, Any]:
37@abc.abstractmethod
38def get_pipe_attributes(
39    self,
40    pipe: mrsm.Pipe,
41    debug: bool = False,
42    **kwargs: Any
43) -> Dict[str, Any]:
44    """
45    Return the pipe's document from the internal `pipes` table.
46
47    Parameters
48    ----------
49    pipe: mrsm.Pipe
50        The pipe whose attributes should be retrieved.
51
52    Returns
53    -------
54    The document that matches the keys of the pipe.
55    """

Return the pipe's document from the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
  • The document that matches the keys of the pipe.
@abc.abstractmethod
def get_pipe_id( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Union[str, int, NoneType]:
57@abc.abstractmethod
58def get_pipe_id(
59    self,
60    pipe: mrsm.Pipe,
61    debug: bool = False,
62    **kwargs: Any
63) -> Union[str, int, None]:
64    """
65    Return the `id` for the pipe if it exists.
66
67    Parameters
68    ----------
69    pipe: mrsm.Pipe
70        The pipe whose `id` to fetch.
71
72    Returns
73    -------
74    The `id` for the pipe's document or `None`.
75    """

Return the id for the pipe if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe whose id to fetch.
Returns
  • The id for the pipe's document or None.
def edit_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
77def edit_pipe(
78    self,
79    pipe: mrsm.Pipe,
80    debug: bool = False,
81    **kwargs: Any
82) -> mrsm.SuccessTuple:
83    """
84    Edit the attributes of the pipe.
85
86    Parameters
87    ----------
88    pipe: mrsm.Pipe
89        The pipe whose in-memory parameters must be persisted.
90
91    Returns
92    -------
93    A `SuccessTuple` indicating success.
94    """
95    raise NotImplementedError

Edit the attributes of the pipe.

Parameters
  • pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
def delete_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 97def delete_pipe(
 98    self,
 99    pipe: mrsm.Pipe,
100    debug: bool = False,
101    **kwargs: Any
102) -> mrsm.SuccessTuple:
103    """
104    Delete a pipe's registration from the `pipes` collection.
105
106    Parameters
107    ----------
108    pipe: mrsm.Pipe
109        The pipe to be deleted.
110
111    Returns
112    -------
113    A `SuccessTuple` indicating success.
114    """
115    raise NotImplementedError

Delete a pipe's registration from the pipes collection.

Parameters
  • pipe (mrsm.Pipe): The pipe to be deleted.
Returns
@abc.abstractmethod
def fetch_pipes_keys( self, connector_keys: Optional[List[str]] = None, metric_keys: Optional[List[str]] = None, location_keys: Optional[List[str]] = None, tags: Optional[List[str]] = None, debug: bool = False, **kwargs: Any) -> List[Tuple[str, str, str]]:
117@abc.abstractmethod
118def fetch_pipes_keys(
119    self,
120    connector_keys: Optional[List[str]] = None,
121    metric_keys: Optional[List[str]] = None,
122    location_keys: Optional[List[str]] = None,
123    tags: Optional[List[str]] = None,
124    debug: bool = False,
125    **kwargs: Any
126) -> List[Tuple[str, str, str]]:
127    """
128    Return a list of tuples for the registered pipes' keys according to the provided filters.
129
130    Parameters
131    ----------
132    connector_keys: list[str] | None, default None
133        The keys passed via `-c`.
134
135    metric_keys: list[str] | None, default None
136        The keys passed via `-m`.
137
138    location_keys: list[str] | None, default None
139        The keys passed via `-l`.
140
141    tags: List[str] | None, default None
142        Tags passed via `--tags` which are stored under `parameters:tags`.
143
144    Returns
145    -------
146    A list of connector, metric, and location keys in tuples.
147    You may return the string "None" for location keys in place of nulls.
148
149    Examples
150    --------
151    >>> import meerschaum as mrsm
152    >>> conn = mrsm.get_connector('example:demo')
153    >>> 
154    >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
155    >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
156    >>> pipe_a.register()
157    >>> pipe_b.register()
158    >>> 
159    >>> conn.fetch_pipes_keys(['a', 'b'])
160    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
161    >>> conn.fetch_pipes_keys(metric_keys=['demo'])
162    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
163    >>> conn.fetch_pipes_keys(tags=['foo'])
164    [('a', 'demo', 'None')]
165    >>> conn.fetch_pipes_keys(location_keys=[None])
166    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
167    """

Return a list of tuples for the registered pipes' keys according to the provided filters.

Parameters
  • connector_keys (list[str] | None, default None): The keys passed via -c.
  • metric_keys (list[str] | None, default None): The keys passed via -m.
  • location_keys (list[str] | None, default None): The keys passed via -l.
  • tags (List[str] | None, default None): Tags passed via --tags which are stored under parameters:tags.
Returns
  • A list of connector, metric, and location keys in tuples.
  • You may return the string "None" for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>> 
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>> 
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
@abc.abstractmethod
def pipe_exists( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> bool:
169@abc.abstractmethod
170def pipe_exists(
171    self,
172    pipe: mrsm.Pipe,
173    debug: bool = False,
174    **kwargs: Any
175) -> bool:
176    """
177    Check whether a pipe's target table exists.
178
179    Parameters
180    ----------
181    pipe: mrsm.Pipe
182        The pipe to check whether its table exists.
183
184    Returns
185    -------
186    A `bool` indicating the table exists.
187    """

Check whether a pipe's target table exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
  • A bool indicating the table exists.
@abc.abstractmethod
def drop_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
189@abc.abstractmethod
190def drop_pipe(
191    self,
192    pipe: mrsm.Pipe,
193    debug: bool = False,
194    **kwargs: Any
195) -> mrsm.SuccessTuple:
196    """
197    Drop a pipe's collection if it exists.
198
199    Parameters
200    ----------
201    pipe: mrsm.Pipe
202        The pipe to be dropped.
203
204    Returns
205    -------
206    A `SuccessTuple` indicating success.
207    """
208    raise NotImplementedError

Drop a pipe's collection if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to be dropped.
Returns
def drop_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
210def drop_pipe_indices(
211    self,
212    pipe: mrsm.Pipe,
213    debug: bool = False,
214    **kwargs: Any
215) -> mrsm.SuccessTuple:
216    """
217    Drop a pipe's indices.
218
219    Parameters
220    ----------
221    pipe: mrsm.Pipe
222        The pipe whose indices need to be dropped.
223
224    Returns
225    -------
226    A `SuccessTuple` indicating success.
227    """
228    return False, f"Cannot drop indices for instance connectors of type '{self.type}'."

Drop a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
@abc.abstractmethod
def sync_pipe( self, pipe: Pipe, df: "'pd.DataFrame'" = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, chunksize: Optional[int] = -1, check_existing: bool = True, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
230@abc.abstractmethod
231def sync_pipe(
232    self,
233    pipe: mrsm.Pipe,
234    df: 'pd.DataFrame' = None,
235    begin: Union[datetime, int, None] = None,
236    end: Union[datetime, int, None] = None,
237    chunksize: Optional[int] = -1,
238    check_existing: bool = True,
239    debug: bool = False,
240    **kwargs: Any
241) -> mrsm.SuccessTuple:
242    """
243    Sync a pipe using a database connection.
244
245    Parameters
246    ----------
247    pipe: mrsm.Pipe
248        The Meerschaum Pipe instance into which to sync the data.
249
250    df: Optional[pd.DataFrame]
251        An optional DataFrame or equivalent to sync into the pipe.
252        Defaults to `None`.
253
254    begin: Union[datetime, int, None], default None
255        Optionally specify the earliest datetime to search for data.
256        Defaults to `None`.
257
258    end: Union[datetime, int, None], default None
259        Optionally specify the latest datetime to search for data.
260        Defaults to `None`.
261
262    chunksize: Optional[int], default -1
263        Specify the number of rows to sync per chunk.
264        If `-1`, resort to system configuration (default is `900`).
265        A `chunksize` of `None` will sync all rows in one transaction.
266        Defaults to `-1`.
267
268    check_existing: bool, default True
269        If `True`, pull and diff with existing data from the pipe. Defaults to `True`.
270
271    debug: bool, default False
272        Verbosity toggle. Defaults to False.
273
274    Returns
275    -------
276    A `SuccessTuple` of success (`bool`) and message (`str`).
277    """

Sync a pipe using a database connection.

Parameters
  • pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
  • df (Optional[pd.DataFrame]): An optional DataFrame or equivalent to sync into the pipe. Defaults to None.
  • begin (Union[datetime, int, None], default None): Optionally specify the earliest datetime to search for data. Defaults to None.
  • end (Union[datetime, int, None], default None): Optionally specify the latest datetime to search for data. Defaults to None.
  • chunksize (Optional[int], default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction. Defaults to -1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe. Defaults to True.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def create_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
279def create_pipe_indices(
280    self,
281    pipe: mrsm.Pipe,
282    debug: bool = False,
283    **kwargs: Any
284) -> mrsm.SuccessTuple:
285    """
286    Create a pipe's indices.
287
288    Parameters
289    ----------
290    pipe: mrsm.Pipe
291        The pipe whose indices need to be created.
292
293    Returns
294    -------
295    A `SuccessTuple` indicating success.
296    """
297    return False, f"Cannot create indices for instance connectors of type '{self.type}'."

Create a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
def clear_pipe( self, pipe: Pipe, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
299def clear_pipe(
300    self,
301    pipe: mrsm.Pipe,
302    begin: Union[datetime, int, None] = None,
303    end: Union[datetime, int, None] = None,
304    params: Optional[Dict[str, Any]] = None,
305    debug: bool = False,
306    **kwargs: Any
307) -> mrsm.SuccessTuple:
308    """
309    Delete rows within `begin`, `end`, and `params`.
310
311    Parameters
312    ----------
313    pipe: mrsm.Pipe
314        The pipe whose rows to clear.
315
316    begin: datetime | int | None, default None
317        If provided, remove rows >= `begin`.
318
319    end: datetime | int | None, default None
320        If provided, remove rows < `end`.
321
322    params: dict[str, Any] | None, default None
323        If provided, only remove rows which match the `params` filter.
324
325    Returns
326    -------
327    A `SuccessTuple` indicating success.
328    """
329    raise NotImplementedError

Delete rows within begin, end, and params.

Parameters
  • pipe (mrsm.Pipe): The pipe whose rows to clear.
  • begin (datetime | int | None, default None): If provided, remove rows >= begin.
  • end (datetime | int | None, default None): If provided, remove rows < end.
  • params (dict[str, Any] | None, default None): If provided, only remove rows which match the params filter.
Returns
@abc.abstractmethod
def get_pipe_data( self, pipe: Pipe, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> "Union['pd.DataFrame', None]":
331@abc.abstractmethod
332def get_pipe_data(
333    self,
334    pipe: mrsm.Pipe,
335    select_columns: Optional[List[str]] = None,
336    omit_columns: Optional[List[str]] = None,
337    begin: Union[datetime, int, None] = None,
338    end: Union[datetime, int, None] = None,
339    params: Optional[Dict[str, Any]] = None,
340    debug: bool = False,
341    **kwargs: Any
342) -> Union['pd.DataFrame', None]:
343    """
344    Query a pipe's target table and return the DataFrame.
345
346    Parameters
347    ----------
348    pipe: mrsm.Pipe
349        The pipe with the target table from which to read.
350
351    select_columns: list[str] | None, default None
352        If provided, only select these given columns.
353        Otherwise select all available columns (i.e. `SELECT *`).
354
355    omit_columns: list[str] | None, default None
356        If provided, remove these columns from the selection.
357
358    begin: datetime | int | None, default None
359        The earliest `datetime` value to search from (inclusive).
360
361    end: datetime | int | None, default None
362        The lastest `datetime` value to search from (exclusive).
363
364    params: dict[str | str] | None, default None
365        Additional filters to apply to the query.
366
367    Returns
368    -------
369    The target table's data as a DataFrame.
370    """

Query a pipe's target table and return the DataFrame.

Parameters
  • pipe (mrsm.Pipe): The pipe with the target table from which to read.
  • select_columns (list[str] | None, default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
  • begin (datetime | int | None, default None): The earliest datetime value to search from (inclusive).
  • end (datetime | int | None, default None): The lastest datetime value to search from (exclusive).
  • params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
  • The target table's data as a DataFrame.
@abc.abstractmethod
def get_sync_time( self, pipe: Pipe, params: Optional[Dict[str, Any]] = None, newest: bool = True, debug: bool = False, **kwargs: Any) -> datetime.datetime | int | None:
372@abc.abstractmethod
373def get_sync_time(
374    self,
375    pipe: mrsm.Pipe,
376    params: Optional[Dict[str, Any]] = None,
377    newest: bool = True,
378    debug: bool = False,
379    **kwargs: Any
380) -> datetime | int | None:
381    """
382    Return the most recent value for the `datetime` axis.
383
384    Parameters
385    ----------
386    pipe: mrsm.Pipe
387        The pipe whose collection contains documents.
388
389    params: dict[str, Any] | None, default None
390        Filter certain parameters when determining the sync time.
391
392    newest: bool, default True
393        If `True`, return the maximum value for the column.
394
395    Returns
396    -------
397    The largest `datetime` or `int` value of the `datetime` axis. 
398    """

Return the most recent value for the datetime axis.

Parameters
  • pipe (mrsm.Pipe): The pipe whose collection contains documents.
  • params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
  • newest (bool, default True): If True, return the maximum value for the column.
Returns
  • The largest datetime or int value of the datetime axis.
@abc.abstractmethod
def get_pipe_columns_types( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, str]:
400@abc.abstractmethod
401def get_pipe_columns_types(
402    self,
403    pipe: mrsm.Pipe,
404    debug: bool = False,
405    **kwargs: Any
406) -> Dict[str, str]:
407    """
408    Return the data types for the columns in the target table for data type enforcement.
409
410    Parameters
411    ----------
412    pipe: mrsm.Pipe
413        The pipe whose target table contains columns and data types.
414
415    Returns
416    -------
417    A dictionary mapping columns to data types.
418    """

Return the data types for the columns in the target table for data type enforcement.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
  • A dictionary mapping columns to data types.
def get_pipe_columns_indices(self, debug: bool = False) -> Dict[str, List[Dict[str, str]]]:
420def get_pipe_columns_indices(
421    self,
422    debug: bool = False,
423) -> Dict[str, List[Dict[str, str]]]:
424    """
425    Return a dictionary mapping columns to metadata about related indices.
426
427    Parameters
428    ----------
429    pipe: mrsm.Pipe
430        The pipe whose target table has related indices.
431
432    Returns
433    -------
434    A list of dictionaries with the keys "type" and "name".
435
436    Examples
437    --------
438    >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
439    >>> pipe.sync([{'color': 'red', 'size': 'M'}])
440    >>> pipe.get_columns_indices()
441    {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
442    """
443    return {}

Return a dictionary mapping columns to metadata about related indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
  • A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
def make_connector(cls, _is_executor: bool = False):
279def make_connector(cls, _is_executor: bool = False):
280    """
281    Register a class as a `Connector`.
282    The `type` will be the lower case of the class name, without the suffix `connector`.
283
284    Parameters
285    ----------
286    instance: bool, default False
287        If `True`, make this connector type an instance connector.
288        This requires implementing the various pipes functions and lots of testing.
289
290    Examples
291    --------
292    >>> import meerschaum as mrsm
293    >>> from meerschaum.connectors import make_connector, Connector
294    >>> 
295    >>> @make_connector
296    >>> class FooConnector(Connector):
297    ...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
298    ... 
299    >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
300    >>> print(conn.username, conn.password)
301    dog cat
302    >>> 
303    """
304    import re
305    from meerschaum.plugins import _get_parent_plugin
306    suffix_regex = (
307        r'connector$'
308        if not _is_executor
309        else r'executor$'
310    )
311    plugin_name = _get_parent_plugin(2)
312    typ = re.sub(suffix_regex, '', cls.__name__.lower())
313    with _locks['types']:
314        types[typ] = cls
315    with _locks['custom_types']:
316        custom_types.add(typ)
317    if plugin_name:
318        with _locks['plugins_types']:
319            if plugin_name not in plugins_types:
320                plugins_types[plugin_name] = []
321            plugins_types[plugin_name].append(typ)
322    with _locks['connectors']:
323        if typ not in connectors:
324            connectors[typ] = {}
325    if getattr(cls, 'IS_INSTANCE', False):
326        with _locks['instance_types']:
327            if typ not in instance_types:
328                instance_types.append(typ)
329
330    return cls

Register a class as a Connector. The type will be the lower case of the class name, without the suffix connector.

Parameters
  • instance (bool, default False): If True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>> 
>>> @make_connector
>>> class FooConnector(Connector):
...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
... 
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
def entry( sysargs: Union[List[str], str, NoneType] = None, _patch_args: Optional[Dict[str, Any]] = None, _use_cli_daemon: bool = True, _session_id: Optional[str] = None) -> Tuple[bool, str]:
52def entry(
53    sysargs: Union[List[str], str, None] = None,
54    _patch_args: Optional[Dict[str, Any]] = None,
55    _use_cli_daemon: bool = True,
56    _session_id: Optional[str] = None,
57) -> SuccessTuple:
58    """
59    Parse arguments and launch a Meerschaum action.
60
61    Returns
62    -------
63    A `SuccessTuple` indicating success.
64    """
65    start = time.perf_counter()
66    from meerschaum.config.environment import get_daemon_env_vars
67    sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs
68    if (
69        not _use_cli_daemon
70        or (not sysargs or (sysargs[0] and sysargs[0].startswith('-')))
71        or '--no-daemon' in sysargs_list
72        or '--daemon' in sysargs_list
73        or '-d' in sysargs_list
74        or get_daemon_env_vars()
75        or not mrsm.get_config('system', 'experimental', 'cli_daemon')
76    ):
77        success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args)
78        end = time.perf_counter()
79        if '--debug' in sysargs_list:
80            print(f"Duration without daemon: {round(end - start, 3)}")
81        return success, msg
82
83    from meerschaum._internal.cli.entry import entry_with_daemon
84    success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args)
85    end = time.perf_counter()
86    if '--debug' in sysargs_list:
87        print(f"Duration with daemon: {round(end - start, 3)}")
88    return success, msg

Parse arguments and launch a Meerschaum action.

Returns