meerschaum

Meerschaum banner

Meerschaum Python API

Welcome to the Meerschaum Python API technical documentation! Here you can find information about the classes and functions provided by the meerschaum package. Visit meerschaum.io for general usage documentation.

Root Module

For your convenience, the following classes and functions may be imported from the root meerschaum namespace:

Examples

Build a Connector

Get existing connectors or build a new one in-memory with the meerschaum.get_connector() factory function:

import meerschaum as mrsm

sql_conn = mrsm.get_connector(
    'sql:temp',
    flavor='sqlite',
    database='/tmp/tmp.db',
)
df = sql_conn.read("SELECT 1 AS foo")
print(df)
#    foo
# 0    1

sql_conn.to_sql(df, 'foo')
print(sql_conn.read('foo'))
#    foo
# 0    1

Create a Custom Connector Class

Decorate your connector classes with meerschaum.make_connector() to designate it as a custom connector:

from datetime import datetime, timezone
from random import randint
import meerschaum as mrsm
from meerschaum.utils.dtypes import round_time

@mrsm.make_connector
class FooConnector(mrsm.Connector):
    REQUIRED_ATTRIBUTES = ['username', 'password']

    def fetch(
        self,
        begin: datetime | None = None,
        end: datetime | None = None,
    ):
        now = begin or round_time(datetime.now(timezone.utc))
        return [
            {'ts': now, 'id': 1, 'vl': randint(1, 100)},
            {'ts': now, 'id': 2, 'vl': randint(1, 100)},
            {'ts': now, 'id': 3, 'vl': randint(1, 100)},
        ]

foo_conn = mrsm.get_connector(
    'foo:bar',
    username='foo',
    password='bar',
)
docs = foo_conn.fetch()

Build a Pipe

Build a meerschaum.Pipe in-memory:

from datetime import datetime
import meerschaum as mrsm

pipe = mrsm.Pipe(
    foo_conn, 'demo',
    instance=sql_conn,
    columns={'datetime': 'ts', 'id': 'id'},
    tags=['production'],
)
pipe.sync(begin=datetime(2024, 1, 1))
df = pipe.get_data()
print(df)
#           ts  id  vl
# 0 2024-01-01   1  97
# 1 2024-01-01   2  18
# 2 2024-01-01   3  96

Add temporary=True to skip registering the pipe in the pipes table.

Query an Integer-Axis Pipe by Datetime

When a pipe's datetime axis is an integer epoch, set precision so datetime bounds can be translated to the axis's integer value. This lets you pass a datetime begin / end to meerschaum.Pipe.get_data() (and to actions like show data, clear, and deduplicate):

from datetime import datetime, timezone
import meerschaum as mrsm

pipe = mrsm.Pipe(
    'demo', 'epoch',
    instance='sql:temp',
    columns={'datetime': 'ts'},
    dtypes={'ts': 'int'},
    precision='millisecond',
)
pipe.sync([{'ts': 1780099200000}])

### The datetime is translated to the epoch value `1780099200000`.
df = pipe.get_data(begin='2026-05-30')

Integer bounds (begin=1780099200000) still pass through unchanged. A datetime bound on a non-epoch integer axis (no precision set) raises a ValueError. Convert directly with meerschaum.utils.dtypes.datetime_to_int():

from datetime import datetime, timezone
from meerschaum.utils.dtypes import datetime_to_int

datetime_to_int(datetime(2026, 5, 30, tzinfo=timezone.utc), 'millisecond')
# 1780099200000

Get Registered Pipes

The meerschaum.get_pipes() function returns a dictionary hierarchy of pipes by connector, metric, and location:

import meerschaum as mrsm

pipes = mrsm.get_pipes(instance='sql:temp')
pipe = pipes['foo:bar']['demo'][None]

Add as_list=True to flatten the hierarchy:

import meerschaum as mrsm

pipes = mrsm.get_pipes(
    tags=['production'],
    instance=sql_conn,
    as_list=True,
)
print(pipes)
# [Pipe('foo:bar', 'demo', instance='sql:temp')]

Filter by the dtype of the datetime index column with datetime_dtypes. Accepted values are 'datetime', 'int', and 'None'; prefix with '_' to negate:

import meerschaum as mrsm

### Only pipes with a timestamp datetime index:
timestamp_pipes = mrsm.get_pipes(datetime_dtypes=['datetime'], as_list=True)

### Only pipes with an integer datetime index:
int_pipes = mrsm.get_pipes(datetime_dtypes=['int'], as_list=True)

### Exclude pipes without a datetime index:
datetime_pipes = mrsm.get_pipes(datetime_dtypes=['_None'], as_list=True)

Import Plugins

You can import a plugin's module through meerschaum.Plugin.module:

import meerschaum as mrsm

plugin = mrsm.Plugin('noaa')
with mrsm.Venv(plugin):
    noaa = plugin.module

If your plugin has submodules, use meerschaum.plugins.from_plugin_import:

from meerschaum.plugins import from_plugin_import
get_defined_pipes = from_plugin_import('compose.utils.pipes', 'get_defined_pipes')

Import multiple plugins with meerschaum.plugins.import_plugins:

from meerschaum.plugins import import_plugins
noaa, compose = import_plugins('noaa', 'compose')

Create a Job

Create a meerschaum.Job with name and sysargs:

import meerschaum as mrsm

job = mrsm.Job('syncing-engine', 'sync pipes --loop')
success, msg = job.start()

Pass executor_keys as the connectors keys of an API instance to create a remote job:

import meerschaum as mrsm

job = mrsm.Job(
    'foo',
    'sync pipes -s daily',
    executor_keys='api:main',
)

Import from a Virtual Environment Use the meerschaum.Venv context manager to activate a virtual environment:

import meerschaum as mrsm

with mrsm.Venv('noaa'):
    import requests

print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

To import packages which may not be installed, use meerschaum.attempt_import():

import meerschaum as mrsm

requests = mrsm.attempt_import('requests', venv='noaa')
print(requests.__file__)
# /home/bmeares/.config/meerschaum/venvs/noaa/lib/python3.12/site-packages/requests/__init__.py

Run Actions

Run sysargs with meerschaum.entry():

import meerschaum as mrsm

success, msg = mrsm.entry('show pipes + show version : x2')

Use meerschaum.actions.get_action() to access an action function directly:

from meerschaum.actions import get_action

show_pipes = get_action(['show', 'pipes'])
success, msg = show_pipes(connector_keys=['plugin:noaa'])

Get a dictionary of available subactions with meerschaum.actions.get_subactions():

from meerschaum.actions import get_subactions

subactions = get_subactions('show')
success, msg = subactions['pipes']()

Create a Plugin

Run bootstrap plugin to create a new plugin:

mrsm bootstrap plugin example

This will create example.py in your plugins directory (default ~/.config/meerschaum/plugins/, Windows: %APPDATA%\Meerschaum\plugins). You may paste the example code from the "Create a Custom Action" example below.

Open your plugin with edit plugin:

mrsm edit plugin example

Run edit plugin and paste the example code below to try out the features.

See the writing plugins guide for more in-depth documentation.

Create a Custom Action

Decorate a function with meerschaum.actions.make_action to designate it as an action. Subactions will be automatically detected if not decorated:

from meerschaum.actions import make_action

@make_action
def sing():
    print('What would you like me to sing?')
    return True, "Success"

def sing_tune():
    return False, "I don't know that song!"

def sing_song():
    print('Hello, World!')
    return True, "Success"

Use meerschaum.plugins.add_plugin_argument() to create new parameters for your action:

from meerschaum.plugins import make_action, add_plugin_argument

add_plugin_argument(
    '--song', type=str, help='What song to sing.',
)

@make_action
def sing_melody(action=None, song=None):
    to_sing = action[0] if action else song
    if not to_sing:
        return False, "Please tell me what to sing!"

    return True, f'~I am singing {to_sing}~'
mrsm sing melody lalala

mrsm sing melody --song do-re-mi

Add a Page to the Web Dashboard Use the decorators meerschaum.plugins.dash_plugin() and meerschaum.plugins.web_page() to add new pages to the web dashboard:

from meerschaum.plugins import dash_plugin, web_page

@dash_plugin
def init_dash(dash_app):

    import dash.html as html
    import dash_bootstrap_components as dbc
    from dash import Input, Output, no_update

    ### Routes to '/dash/my-page'
    @web_page('/my-page', login_required=False)
    def my_page():
        return dbc.Container([
            html.H1("Hello, World!"),
            dbc.Button("Click me", id='my-button'),
            html.Div(id="my-output-div"),
        ])

    @dash_app.callback(
        Output('my-output-div', 'children'),
        Input('my-button', 'n_clicks'),
    )
    def my_button_click(n_clicks):
        if not n_clicks:
            return no_update
        return html.P(f'You clicked {n_clicks} times!')

Submodules

meerschaum.actions
Access functions for actions and subactions.

meerschaum.config
Read and write the Meerschaum configuration registry.

meerschaum.connectors
Build connectors to interact with databases and fetch data.

meerschaum.jobs
Start background jobs.

meerschaum.plugins
Access plugin modules and other API utilties.

meerschaum.utils
Utility functions are available in several submodules:

 1#! /usr/bin/env python
 2# -*- coding: utf-8 -*-
 3# vim:fenc=utf-8
 4
 5"""
 6Copyright 2020–2026 Bennett Meares
 7
 8Licensed under the Apache License, Version 2.0 (the "License");
 9you may not use this file except in compliance with the License.
10You may obtain a copy of the License at
11
12   http://www.apache.org/licenses/LICENSE-2.0
13
14Unless required by applicable law or agreed to in writing, software
15distributed under the License is distributed on an "AS IS" BASIS,
16WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17See the License for the specific language governing permissions and
18limitations under the License.
19"""
20
21import atexit
22
23from meerschaum.utils.typing import SuccessTuple
24from meerschaum.utils.packages import attempt_import
25from meerschaum.core.Pipe import Pipe
26from meerschaum.plugins import Plugin
27from meerschaum.utils.venv import Venv
28from meerschaum.jobs import Job, make_executor
29from meerschaum.connectors import get_connector, Connector, InstanceConnector, make_connector
30from meerschaum.utils import get_pipes
31from meerschaum.utils.formatting import pprint
32from meerschaum._internal.docs import index as __doc__
33from meerschaum.config import __version__, get_config
34from meerschaum._internal.entry import entry
35from meerschaum.__main__ import _close_pools
36
37atexit.register(_close_pools)
38
39__pdoc__ = {'gui': False, 'api': False, 'core': False, '_internal': False}
40__all__ = (
41    "get_pipes",
42    "get_connector",
43    "get_config",
44    "Pipe",
45    "Plugin",
46    "SuccessTuple",
47    "Venv",
48    "Plugin",
49    "Job",
50    "pprint",
51    "attempt_import",
52    "actions",
53    "config",
54    "connectors",
55    "jobs",
56    "plugins",
57    "utils",
58    "SuccessTuple",
59    "Connector",
60    "InstanceConnector",
61    "make_connector",
62    "entry",
63)
def get_pipes( connector_keys: Union[str, List[str], NoneType] = None, metric_keys: Union[str, List[str], NoneType] = None, location_keys: Union[str, List[str], NoneType] = None, tags: Optional[List[str]] = None, targets: Optional[List[str]] = None, datetime_dtypes: Optional[List[str]] = None, params: Optional[Dict[str, Any]] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, instance: Union[str, InstanceConnector, NoneType] = None, as_list: bool = False, as_tags_dict: bool = False, as_targets_dict: bool = False, method: str = 'registered', workers: Optional[int] = None, debug: bool = False, _cache_parameters: bool = True, **kw: Any) -> Union[Dict[str, Dict[str, Dict[Optional[str], Pipe]]], List[Pipe], Dict[str, Pipe]]:
 29def get_pipes(
 30    connector_keys: Union[str, List[str], None] = None,
 31    metric_keys: Union[str, List[str], None] = None,
 32    location_keys: Union[str, List[str], None] = None,
 33    tags: Optional[List[str]] = None,
 34    targets: Optional[List[str]] = None,
 35    datetime_dtypes: Optional[List[str]] = None,
 36    params: Optional[Dict[str, Any]] = None,
 37    mrsm_instance: Union[str, InstanceConnector, None] = None,
 38    instance: Union[str, InstanceConnector, None] = None,
 39    as_list: bool = False,
 40    as_tags_dict: bool = False,
 41    as_targets_dict: bool = False,
 42    method: str = 'registered',
 43    workers: Optional[int] = None,
 44    debug: bool = False,
 45    _cache_parameters: bool = True,
 46    **kw: Any
 47) -> Union[PipesDict, List[mrsm.Pipe], Dict[str, mrsm.Pipe]]:
 48    """
 49    Return a dictionary or list of `meerschaum.Pipe` objects.
 50
 51    Parameters
 52    ----------
 53    connector_keys: Union[str, List[str], None], default None
 54        String or list of connector keys.
 55        If omitted or is `'*'`, fetch all possible keys.
 56        If a string begins with `'_'`, select keys that do NOT match the string.
 57
 58    metric_keys: Union[str, List[str], None], default None
 59        String or list of metric keys. See `connector_keys` for formatting.
 60
 61    location_keys: Union[str, List[str], None], default None
 62        String or list of location keys. See `connector_keys` for formatting.
 63
 64    tags: Optional[List[str]], default None
 65        If provided, only include pipes with these tags.
 66
 67    datetime_dtypes: Optional[List[str]], default None
 68        If provided, only include pipes with the corresponding `datetime` axis dtypes.
 69        Accepted values are `datetime`, `int`, `None` (or `null`, etc.).
 70        May be negated by `_`.
 71
 72    params: Optional[Dict[str, Any]], default None
 73        Dictionary of additional parameters to search by.
 74        Params are parsed into a SQL WHERE clause.
 75        E.g. `{'a': 1, 'b': 2}` equates to `'WHERE a = 1 AND b = 2'`
 76
 77    mrsm_instance: Union[str, InstanceConnector, None], default None
 78        Connector keys for the Meerschaum instance of the pipes.
 79        Must be a `meerschaum.connectors.sql.SQLConnector.SQLConnector` or
 80        `meerschaum.connectors.api.APIConnector.APIConnector`.
 81        
 82    as_list: bool, default False
 83        If `True`, return pipes in a list instead of a hierarchical dictionary.
 84        `False` : `{connector_keys: {metric_key: {location_key: Pipe}}}`
 85        `True`  : `[Pipe]`
 86
 87    as_tags_dict: bool, default False
 88        If `True`, return a dictionary mapping tags to pipes.
 89        Pipes with multiple tags will be repeated.
 90
 91    as_targets_dict: bool, default False
 92        If `True`, return a dictionary mapping `(schema, target)` tuples to pipes.
 93        Pipes sharing the same target across different schemata are grouped separately.
 94
 95    method: str, default 'registered'
 96        Available options: `['registered', 'explicit', 'all']`
 97        If `'registered'` (default), create pipes based on registered keys in the connector's pipes table
 98        (API or SQL connector, depends on mrsm_instance).
 99        If `'explicit'`, create pipes from provided connector_keys, metric_keys, and location_keys
100        instead of consulting the pipes table. Useful for creating non-existent pipes.
101        If `'all'`, create pipes from predefined metrics and locations. Required `connector_keys`.
102        **NOTE:** Method `'all'` is not implemented!
103
104    workers: Optional[int], default None
105        If provided (and `as_tags_dict` or `as_targets_dict` is `True`), set the number of workers
106        for the pool to fetch tags or targets.
107        Only takes effect if the instance connector supports multi-threading.
108
109    **kw: Any:
110        Keyword arguments to pass to the `meerschaum.Pipe` constructor.
111
112    Returns
113    -------
114    A dictionary of dictionaries and `meerschaum.Pipe` objects
115    in the connector, metric, location hierarchy.
116    If `as_list` is `True`, return a list of `meerschaum.Pipe` objects.
117    If `as_tags_dict` is `True`, return a dictionary mapping tags to pipes.
118    If `as_targets_dict` is `True`, return a dictionary mapping targets to pipes.
119
120    Examples
121    --------
122    ```
123    >>> ### Manual definition:
124    >>> pipes = {
125    ...     <connector_keys>: {
126    ...         <metric_key>: {
127    ...             <location_key>: Pipe(
128    ...                 <connector_keys>,
129    ...                 <metric_key>,
130    ...                 <location_key>,
131    ...             ),
132    ...         },
133    ...     },
134    ... },
135    >>> ### Accessing a single pipe:
136    >>> pipes['sql:main']['weather'][None]
137    >>> ### Return a list instead:
138    >>> get_pipes(as_list=True)
139    [Pipe('sql:main', 'weather')]
140    >>> get_pipes(as_tags_dict=True)
141    {'gvl': Pipe('sql:main', 'weather')}
142    ```
143    """
144    import json
145    from collections import defaultdict
146    from meerschaum.config import get_config
147    from meerschaum.config.static import STATIC_CONFIG
148    from meerschaum.utils.warnings import error
149    from meerschaum.utils.misc import filter_keywords, separate_negation_values
150    from meerschaum.utils.pool import get_pool
151    from meerschaum.utils.pipes import replace_pipes_syntax
152    from meerschaum.utils.debug import dprint
153    from meerschaum.utils.dtypes import value_is_null, get_current_timestamp
154    from meerschaum import Pipe
155
156    negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
157    if datetime_dtypes:
158        if isinstance(datetime_dtypes, str):
159            datetime_dtypes = [datetime_dtypes]
160        for _dt in datetime_dtypes:
161            _clean = str(_dt).lstrip(negation_prefix).lower()
162            if _clean not in ('datetime', 'int') and not value_is_null(_clean):
163                error(f"Invalid datetime dtype '{_dt}'.")
164
165    if connector_keys is None:
166        connector_keys = []
167    if metric_keys is None:
168        metric_keys = []
169    if location_keys is None:
170        location_keys = []
171    if params is None:
172        params = {}
173    if tags is None:
174        tags = []
175
176    if isinstance(connector_keys, str):
177        connector_keys = [connector_keys]
178    if isinstance(metric_keys, str):
179        metric_keys = [metric_keys]
180    if isinstance(location_keys, str):
181        location_keys = [location_keys]
182
183    ### Get SQL or API connector (keys come from `connector.fetch_pipes_keys()`).
184    if mrsm_instance is None:
185        mrsm_instance = instance
186    if mrsm_instance is None:
187        mrsm_instance = get_config('meerschaum', 'instance', patch=True)
188    if isinstance(mrsm_instance, str):
189        from meerschaum.connectors.parse import parse_instance_keys
190        connector = parse_instance_keys(keys=mrsm_instance, debug=debug)
191    else:
192        from meerschaum.connectors import instance_types
193        valid_connector = False
194        if hasattr(mrsm_instance, 'type'):
195            if mrsm_instance.type in instance_types:
196                valid_connector = True
197        if not valid_connector:
198            error(f"Invalid instance connector: {mrsm_instance}")
199        connector = mrsm_instance
200    if debug:
201        dprint(f"Using instance connector: {connector}")
202    if not connector:
203        error(f"Could not create connector from keys: '{mrsm_instance}'")
204
205    ### Get a list of tuples for the keys needed to build pipes.
206    result = fetch_pipes_keys(
207        method,
208        connector,
209        connector_keys = connector_keys,
210        metric_keys = metric_keys,
211        location_keys = location_keys,
212        tags = tags,
213        params = params,
214        workers = workers,
215        debug = debug
216    )
217    if result is None:
218        error("Unable to build pipes!")
219    result_items: List[Tuple] = (
220        list(result.items())
221        if isinstance(result, dict)
222        else [(None, keys_tuple) for keys_tuple in result]
223    )
224
225    ### Populate the `pipes` dictionary with Pipes based on the keys
226    ### obtained from the chosen `method`.
227    in_dtypes, ex_dtypes = separate_negation_values(datetime_dtypes or [])
228    in_targets, ex_targets = separate_negation_values(targets or [])
229    pipes: PipesDict = {}
230    targets_pipes: Dict[Tuple[Optional[str], str], List[mrsm.Pipe]] = defaultdict(lambda: [])
231    connector_schema = getattr(connector, 'schema', None)
232    connector_is_sql = getattr(connector, 'type', None) == 'sql'
233    connector_flavor = getattr(connector, 'flavor', None)
234    for pipe_id, keys_tuple in result_items:
235        ck, mk, lk = keys_tuple[0], keys_tuple[1], keys_tuple[2]
236        pipe_tags_or_parameters = keys_tuple[3] if len(keys_tuple) == 4 else None
237        pipe_parameters = (
238            pipe_tags_or_parameters
239            if isinstance(pipe_tags_or_parameters, (dict, str))
240            else None
241        )
242        if isinstance(pipe_parameters, str):
243            pipe_parameters = json.loads(pipe_parameters)
244        pipe_tags = (
245            pipe_tags_or_parameters
246            if isinstance(pipe_tags_or_parameters, list)
247            else (
248                pipe_tags_or_parameters.get('tags', [])
249                if isinstance(pipe_tags_or_parameters, dict)
250                else None
251            )
252        )
253
254        pipe = Pipe(
255            ck, mk, lk,
256            mrsm_instance = connector,
257            parameters = pipe_parameters,
258            tags = pipe_tags,
259            debug = debug,
260            **filter_keywords(Pipe, **kw)
261        )
262        pipe.__dict__['_tags'] = pipe_tags
263        if pipe_id is not None:
264            pipe._cache_value('_id', pipe_id, memory_only=True, debug=debug)
265        if pipe_parameters is not None:
266            now = get_current_timestamp('ms', as_int=True) / 1000
267            full_attributes = {
268                'connector_keys': ck,
269                'metric_key': mk,
270                'location_key': lk,
271                'parameters': pipe_parameters,
272            }
273            if pipe_id is not None:
274                full_attributes['pipe_id'] = pipe_id
275            pipe._cache_value('attributes', full_attributes, memory_only=True, debug=debug)
276            pipe._cache_value('_attributes_sync_time', now, memory_only=True, debug=debug)
277
278        if datetime_dtypes or targets:
279            parameters_str = str(pipe_parameters)
280            if pipe_parameters is None or 'MRSM{' in parameters_str or 'Pipe(' in parameters_str:
281                pipe_parameters = pipe.get_parameters(debug=debug)
282
283        keep_pipe = True
284
285        if datetime_dtypes:
286            columns_val = (pipe_parameters or {}).get('columns', {}) or {}
287            dt_col = columns_val.get('datetime', None)
288            pipe_dtypes = (
289                ((pipe_parameters or {}).get('dtypes', None) or {})
290                if dt_col
291                else None
292            )
293            dt_typ = pipe_dtypes.get(dt_col, None) if dt_col else None
294
295            def _dtype_matches(clean_d):
296                if not dt_col:
297                    return value_is_null(clean_d)
298                return (
299                    (clean_d == 'int' and 'int' in str(dt_typ).lower())
300                    or
301                    (clean_d == 'datetime' and 'int' not in str(dt_typ).lower())
302                )
303
304            in_match = not in_dtypes or any(_dtype_matches(d) for d in in_dtypes)
305            ex_match = bool(ex_dtypes and any(_dtype_matches(d) for d in ex_dtypes))
306            keep_pipe = keep_pipe and in_match and not ex_match
307            if not keep_pipe:
308                continue
309
310        if targets:
311            pipe_target = pipe.target
312            in_target_match = not in_targets or any(t == pipe_target for t in in_targets)
313            ex_target_match = bool(ex_targets and any(t == pipe_target for t in ex_targets))
314            keep_pipe = keep_pipe and in_target_match and not ex_target_match
315            if not keep_pipe:
316                continue
317
318        if ck not in pipes:
319            pipes[ck] = {}
320
321        if mk not in pipes[ck]:
322            pipes[ck][mk] = {}
323
324
325        pipes[ck][mk][lk] = pipe
326
327        if as_targets_dict:
328            raw_params = pipe_parameters if isinstance(pipe_parameters, dict) else {}
329            schema = raw_params.get('schema') or connector_schema
330            explicit_target = (
331                raw_params.get('target')
332                or raw_params.get('target_name')
333                or raw_params.get('target_table')
334                or raw_params.get('target_table_name')
335            )
336            if explicit_target:
337                target_name = (
338                    replace_pipes_syntax(explicit_target, _pipe=pipe)
339                    if isinstance(explicit_target, str) and '{{' in explicit_target
340                    else explicit_target
341                )
342            else:
343                target_name = pipe._target_legacy()
344                if connector_is_sql and connector_flavor:
345                    from meerschaum.utils.sql import truncate_item_name
346                    target_name = truncate_item_name(target_name, connector_flavor)
347            targets_pipes[(schema, target_name)].append(pipe)
348
349    if not as_list and not as_tags_dict and not as_targets_dict:
350        return pipes
351
352    from meerschaum.utils.pipes import flatten_pipes_dict
353    pipes_list = flatten_pipes_dict(pipes)
354    if as_list:
355        return pipes_list
356
357    pool = get_pool(workers=(workers if connector.IS_THREAD_SAFE else 1))
358
359    def gather_pipe_tags(pipe: mrsm.Pipe) -> Tuple[mrsm.Pipe, List[str]]:
360        _tags = pipe.__dict__.get('_tags', None)
361        gathered_tags = _tags if _tags is not None else pipe.tags
362        return pipe, (gathered_tags or [])
363
364    if as_tags_dict:
365        tags_pipes = defaultdict(lambda: [])
366        pipes_tags = dict(pool.map(gather_pipe_tags, pipes_list))
367        for pipe, tags in pipes_tags.items():
368            for tag in (tags or []):
369                tags_pipes[tag].append(pipe)
370
371        return dict(tags_pipes)
372
373    if as_targets_dict:
374        return dict(targets_pipes)
375
376    raise NotImplementedError("No futher options for returning pipes.")

Return a dictionary or list of meerschaum.Pipe objects.

Parameters
  • connector_keys (Union[str, List[str], None], default None): String or list of connector keys. If omitted or is '*', fetch all possible keys. If a string begins with '_', select keys that do NOT match the string.
  • metric_keys (Union[str, List[str], None], default None): String or list of metric keys. See connector_keys for formatting.
  • location_keys (Union[str, List[str], None], default None): String or list of location keys. See connector_keys for formatting.
  • tags (Optional[List[str]], default None): If provided, only include pipes with these tags.
  • datetime_dtypes (Optional[List[str]], default None): If provided, only include pipes with the corresponding datetime axis dtypes. Accepted values are datetime, int, None (or null, etc.). May be negated by _.
  • params (Optional[Dict[str, Any]], default None): Dictionary of additional parameters to search by. Params are parsed into a SQL WHERE clause. E.g. {'a': 1, 'b': 2} equates to 'WHERE a = 1 AND b = 2'
  • mrsm_instance (Union[str, InstanceConnector, None], default None): Connector keys for the Meerschaum instance of the pipes. Must be a meerschaum.connectors.sql.SQLConnector.SQLConnector or meerschaum.connectors.api.APIConnector.APIConnector.
  • as_list (bool, default False): If True, return pipes in a list instead of a hierarchical dictionary. False : {connector_keys: {metric_key: {location_key: Pipe}}} True : [Pipe]
  • as_tags_dict (bool, default False): If True, return a dictionary mapping tags to pipes. Pipes with multiple tags will be repeated.
  • as_targets_dict (bool, default False): If True, return a dictionary mapping (schema, target) tuples to pipes. Pipes sharing the same target across different schemata are grouped separately.
  • method (str, default 'registered'): Available options: ['registered', 'explicit', 'all'] If 'registered' (default), create pipes based on registered keys in the connector's pipes table (API or SQL connector, depends on mrsm_instance). If 'explicit', create pipes from provided connector_keys, metric_keys, and location_keys instead of consulting the pipes table. Useful for creating non-existent pipes. If 'all', create pipes from predefined metrics and locations. Required connector_keys. NOTE: Method 'all' is not implemented!
  • workers (Optional[int], default None): If provided (and as_tags_dict or as_targets_dict is True), set the number of workers for the pool to fetch tags or targets. Only takes effect if the instance connector supports multi-threading.
  • **kw (Any:): Keyword arguments to pass to the meerschaum.Pipe constructor.
Returns
  • A dictionary of dictionaries and meerschaum.Pipe objects
  • in the connector, metric, location hierarchy.
  • If as_list is True, return a list of meerschaum.Pipe objects.
  • If as_tags_dict is True, return a dictionary mapping tags to pipes.
  • If as_targets_dict is True, return a dictionary mapping targets to pipes.
Examples
>>> ### Manual definition:
>>> pipes = {
...     <connector_keys>: {
...         <metric_key>: {
...             <location_key>: Pipe(
...                 <connector_keys>,
...                 <metric_key>,
...                 <location_key>,
...             ),
...         },
...     },
... },
>>> ### Accessing a single pipe:
>>> pipes['sql:main']['weather'][None]
>>> ### Return a list instead:
>>> get_pipes(as_list=True)
[Pipe('sql:main', 'weather')]
>>> get_pipes(as_tags_dict=True)
{'gvl': Pipe('sql:main', 'weather')}
def get_connector( type: str = None, label: str = None, refresh: bool = False, debug: bool = False, _load_plugins: bool = True, **kw: Any) -> Connector:
 68def get_connector(
 69    type: str = None,
 70    label: str = None,
 71    refresh: bool = False,
 72    debug: bool = False,
 73    _load_plugins: bool = True,
 74    **kw: Any
 75) -> Connector:
 76    """
 77    Return existing connector or create new connection and store for reuse.
 78    
 79    You can create new connectors if enough parameters are provided for the given type and flavor.
 80
 81    Parameters
 82    ----------
 83    type: Optional[str], default None
 84        Connector type (sql, api, etc.).
 85        Defaults to the type of the configured `instance_connector`.
 86
 87    label: Optional[str], default None
 88        Connector label (e.g. main). Defaults to `'main'`.
 89
 90    refresh: bool, default False
 91        Refresh the Connector instance / construct new object. Defaults to `False`.
 92
 93    kw: Any
 94        Other arguments to pass to the Connector constructor.
 95        If the Connector has already been constructed and new arguments are provided,
 96        `refresh` is set to `True` and the old Connector is replaced.
 97
 98    Returns
 99    -------
100    A new Meerschaum connector (e.g. `meerschaum.connectors.api.APIConnector`,
101    `meerschaum.connectors.sql.SQLConnector`).
102    
103    Examples
104    --------
105    The following parameters would create a new
106    `meerschaum.connectors.sql.SQLConnector` that isn't in the configuration file.
107
108    ```
109    >>> conn = get_connector(
110    ...     type = 'sql',
111    ...     label = 'newlabel',
112    ...     flavor = 'sqlite',
113    ...     database = '/file/path/to/database.db'
114    ... )
115    >>>
116    ```
117
118    """
119    from meerschaum.connectors.parse import parse_instance_keys
120    from meerschaum.config import get_config
121    from meerschaum._internal.static import STATIC_CONFIG
122    from meerschaum.utils.warnings import warn
123    global _loaded_plugin_connectors
124    if isinstance(type, str) and not label and ':' in type:
125        type, label = type.split(':', maxsplit=1)
126
127    if _load_plugins:
128        with _locks['_loaded_plugin_connectors']:
129            if not _loaded_plugin_connectors:
130                load_plugin_connectors()
131                _load_builtin_custom_connectors()
132                _loaded_plugin_connectors = True
133
134    if type is None and label is None:
135        default_instance_keys = get_config('meerschaum', 'instance', patch=True)
136        ### recursive call to get_connector
137        return parse_instance_keys(default_instance_keys)
138
139    ### NOTE: the default instance connector may not be main.
140    ### Only fall back to 'main' if the type is provided by the label is omitted.
141    label = label if label is not None else STATIC_CONFIG['connectors']['default_label']
142
143    ### type might actually be a label. Check if so and raise a warning.
144    if type not in connectors:
145        possibilities, poss_msg = [], ""
146        for _type in get_config('meerschaum', 'connectors'):
147            if type in get_config('meerschaum', 'connectors', _type):
148                possibilities.append(f"{_type}:{type}")
149        if len(possibilities) > 0:
150            poss_msg = " Did you mean"
151            for poss in possibilities[:-1]:
152                poss_msg += f" '{poss}',"
153            if poss_msg.endswith(','):
154                poss_msg = poss_msg[:-1]
155            if len(possibilities) > 1:
156                poss_msg += " or"
157            poss_msg += f" '{possibilities[-1]}'?"
158
159        warn(f"Cannot create Connector of type '{type}'." + poss_msg, stack=False)
160        return None
161
162    if 'sql' not in types:
163        from meerschaum.connectors.plugin import PluginConnector
164        from meerschaum.connectors.valkey import ValkeyConnector
165        with _locks['types']:
166            types.update({
167                'api': APIConnector,
168                'sql': SQLConnector,
169                'plugin': PluginConnector,
170                'valkey': ValkeyConnector,
171            })
172
173    ### determine if we need to call the constructor
174    if not refresh:
175        ### see if any user-supplied arguments differ from the existing instance
176        if label in connectors[type]:
177            warning_message = None
178            for attribute, value in kw.items():
179                if attribute not in connectors[type][label].meta:
180                    import inspect
181                    cls = connectors[type][label].__class__
182                    cls_init_signature = inspect.signature(cls)
183                    cls_init_params = cls_init_signature.parameters
184                    if attribute not in cls_init_params:
185                        warning_message = (
186                            f"Received new attribute '{attribute}' not present in connector " +
187                            f"{connectors[type][label]}.\n"
188                        )
189                elif connectors[type][label].__dict__[attribute] != value:
190                    warning_message = (
191                        f"Mismatched values for attribute '{attribute}' in connector "
192                        + f"'{connectors[type][label]}'.\n" +
193                        f"  - Keyword value: '{value}'\n" +
194                        f"  - Existing value: '{connectors[type][label].__dict__[attribute]}'\n"
195                    )
196            if warning_message is not None:
197                warning_message += (
198                    "\nSetting `refresh` to True and recreating connector with type:"
199                    + f" '{type}' and label '{label}'."
200                )
201                refresh = True
202                warn(warning_message)
203        else: ### connector doesn't yet exist
204            refresh = True
205
206    ### only create an object if refresh is True
207    ### (can be manually specified, otherwise determined above)
208    if refresh:
209        with _locks['connectors']:
210            try:
211                ### will raise an error if configuration is incorrect / missing
212                conn = types[type](label=label, **kw)
213                connectors[type][label] = conn
214            except InvalidAttributesError as ie:
215                warn(
216                    f"Incorrect attributes for connector '{type}:{label}'.\n"
217                    + str(ie),
218                    stack = False,
219                )
220                conn = None
221            except Exception as e:
222                from meerschaum.utils.formatting import get_console
223                console = get_console()
224                if console:
225                    console.print_exception()
226                warn(
227                    f"Exception when creating connector '{type}:{label}'.\n" + str(e),
228                    stack = False,
229                )
230                conn = None
231        if conn is None:
232            return None
233
234    return connectors[type][label]

Return existing connector or create new connection and store for reuse.

You can create new connectors if enough parameters are provided for the given type and flavor.

Parameters
  • type (Optional[str], default None): Connector type (sql, api, etc.). Defaults to the type of the configured instance_connector.
  • label (Optional[str], default None): Connector label (e.g. main). Defaults to 'main'.
  • refresh (bool, default False): Refresh the Connector instance / construct new object. Defaults to False.
  • kw (Any): Other arguments to pass to the Connector constructor. If the Connector has already been constructed and new arguments are provided, refresh is set to True and the old Connector is replaced.
Returns
Examples

The following parameters would create a new meerschaum.connectors.sql.SQLConnector that isn't in the configuration file.

>>> conn = get_connector(
...     type = 'sql',
...     label = 'newlabel',
...     flavor = 'sqlite',
...     database = '/file/path/to/database.db'
... )
>>>
def get_config( *keys: str, patch: bool = True, substitute: bool = True, sync_files: bool = True, write_missing: bool = True, as_tuple: bool = False, warn: bool = True, debug: bool = False) -> Any:
112def get_config(
113    *keys: str,
114    patch: bool = True,
115    substitute: bool = True,
116    sync_files: bool = True,
117    write_missing: bool = True,
118    as_tuple: bool = False,
119    warn: bool = True,
120    debug: bool = False
121) -> Any:
122    """
123    Return the Meerschaum configuration dictionary.
124    If positional arguments are provided, index by the keys.
125    Raises a warning if invalid keys are provided.
126
127    Parameters
128    ----------
129    keys: str:
130        List of strings to index.
131
132    patch: bool, default True
133        If `True`, patch missing default keys into the config directory.
134        Defaults to `True`.
135
136    sync_files: bool, default True
137        If `True`, sync files if needed.
138        Defaults to `True`.
139
140    write_missing: bool, default True
141        If `True`, write default values when the main config files are missing.
142        Defaults to `True`.
143
144    substitute: bool, default True
145        If `True`, subsitute 'MRSM{}' values.
146        Defaults to `True`.
147
148    as_tuple: bool, default False
149        If `True`, return a tuple of type (success, value).
150        Defaults to `False`.
151        
152    Returns
153    -------
154    The value in the configuration directory, indexed by the provided keys.
155
156    Examples
157    --------
158    >>> get_config('meerschaum', 'instance')
159    'sql:main'
160    >>> get_config('does', 'not', 'exist')
161    UserWarning: Invalid keys in config: ('does', 'not', 'exist')
162    """
163    import json
164
165    symlinks_key = STATIC_CONFIG['config']['symlinks_key']
166    if debug:
167        from meerschaum.utils.debug import dprint
168        dprint(f"Indexing keys: {keys}", color=False)
169
170    if len(keys) == 0:
171        _rc = _config(
172            substitute=substitute,
173            sync_files=sync_files,
174            write_missing=(write_missing and _allow_write_missing),
175        )
176        if as_tuple:
177            return True, _rc 
178        return _rc
179    
180    ### Weird threading issues, only import if substitute is True.
181    if substitute:
182        from meerschaum.config._read_config import search_and_substitute_config
183    ### Invalidate the cache if it was read before with substitute=False
184    ### but there still exist substitutions.
185    if (
186        config is not None and substitute and keys[0] != symlinks_key
187        and 'MRSM{' in json.dumps(config.get(keys[0]))
188    ):
189        try:
190            _subbed = search_and_substitute_config({keys[0]: config[keys[0]]})
191        except Exception:
192            import traceback
193            traceback.print_exc()
194            _subbed = {keys[0]: config[keys[0]]}
195
196        config[keys[0]] = _subbed[keys[0]]
197        if symlinks_key in _subbed:
198            if symlinks_key not in config:
199                config[symlinks_key] = {}
200            config[symlinks_key] = apply_patch_to_config(
201                _subbed.get(symlinks_key, {}),
202                config.get(symlinks_key, {}),
203            )
204
205    from meerschaum.config._sync import sync_files as _sync_files
206    if config is None:
207        _config(*keys, sync_files=sync_files)
208
209    invalid_keys = False
210    if keys[0] not in config and keys[0] != symlinks_key:
211        single_key_config = read_config(
212            keys=[keys[0]], substitute=substitute, write_missing=write_missing
213        )
214        if keys[0] not in single_key_config:
215            invalid_keys = True
216        else:
217            config[keys[0]] = single_key_config.get(keys[0], None)
218            if symlinks_key in single_key_config and keys[0] in single_key_config[symlinks_key]:
219                if symlinks_key not in config:
220                    config[symlinks_key] = {}
221                config[symlinks_key][keys[0]] = single_key_config[symlinks_key][keys[0]]
222
223            if sync_files:
224                _sync_files(keys=[keys[0]])
225
226    c = config
227    if len(keys) > 0:
228        for k in keys:
229            try:
230                c = c[k]
231            except Exception:
232                invalid_keys = True
233                break
234        if invalid_keys:
235            ### Check if the keys are in the default configuration.
236            from meerschaum.config._default import default_config
237            in_default = True
238            patched_default_config = (
239                search_and_substitute_config(default_config)
240                if substitute else copy.deepcopy(default_config)
241            )
242            _c = patched_default_config
243            for k in keys:
244                try:
245                    _c = _c[k]
246                except Exception:
247                    in_default = False
248            if in_default:
249                c = _c
250                invalid_keys = False
251            warning_msg = f"Invalid keys in config: {keys}"
252            if not in_default:
253                try:
254                    if warn:
255                        from meerschaum.utils.warnings import warn as _warn
256                        _warn(warning_msg, stacklevel=3, color=False)
257                except Exception:
258                    if warn:
259                        print(warning_msg)
260                if as_tuple:
261                    return False, None
262                return None
263
264            ### Don't write keys that we haven't yet loaded into memory.
265            not_loaded_keys = [k for k in patched_default_config if k not in config]
266            for k in not_loaded_keys:
267                patched_default_config.pop(k, None)
268
269            set_config(
270                apply_patch_to_config(
271                    patched_default_config,
272                    config,
273                )
274            )
275            if patch and keys[0] != symlinks_key and write_missing:
276                ### Only persist defaults when the key's file is genuinely absent.
277                ### Never overwrite an existing file (e.g. one that failed to parse) ─
278                ### doing so would clobber the user's config with default values.
279                ### Brand-new config files are still created by `read_config()`.
280                from meerschaum.config._read_config import get_keyfile_path
281                keyfile_exists = get_keyfile_path(keys[0], create_new=False) is not None
282                if not keyfile_exists:
283                    write_config(config, debug=debug)
284
285    if as_tuple:
286        return (not invalid_keys), c
287    return c

Return the Meerschaum configuration dictionary. If positional arguments are provided, index by the keys. Raises a warning if invalid keys are provided.

Parameters
  • keys (str:): List of strings to index.
  • patch (bool, default True): If True, patch missing default keys into the config directory. Defaults to True.
  • sync_files (bool, default True): If True, sync files if needed. Defaults to True.
  • write_missing (bool, default True): If True, write default values when the main config files are missing. Defaults to True.
  • substitute (bool, default True): If True, subsitute 'MRSM{}' values. Defaults to True.
  • as_tuple (bool, default False): If True, return a tuple of type (success, value). Defaults to False.
Returns
  • The value in the configuration directory, indexed by the provided keys.
Examples
>>> get_config('meerschaum', 'instance')
'sql:main'
>>> get_config('does', 'not', 'exist')
UserWarning: Invalid keys in config: ('does', 'not', 'exist')
class Pipe:
 66class Pipe:
 67    """
 68    Access Meerschaum pipes via Pipe objects.
 69    
 70    Pipes are identified by the following:
 71
 72    1. Connector keys (e.g. `'sql:main'`)
 73    2. Metric key (e.g. `'weather'`)
 74    3. Location (optional; e.g. `None`)
 75    
 76    A pipe's connector keys correspond to a data source, and when the pipe is synced,
 77    its `fetch` definition is evaluated and executed to produce new data.
 78    
 79    Alternatively, new data may be directly synced via `pipe.sync()`:
 80    
 81    ```
 82    >>> from meerschaum import Pipe
 83    >>> pipe = Pipe('csv', 'weather')
 84    >>>
 85    >>> import pandas as pd
 86    >>> df = pd.read_csv('weather.csv')
 87    >>> pipe.sync(df)
 88    ```
 89    """
 90
 91    from ._fetch import (
 92        fetch,
 93        get_backtrack_interval,
 94    )
 95    from ._data import (
 96        get_data,
 97        get_backtrack_data,
 98        get_rowcount,
 99        get_size,
100        get_data,
101        get_doc,
102        get_docs,
103        get_value,
104        _get_data_as_iterator,
105        get_chunk_interval,
106        get_chunk_bounds,
107        get_chunk_bounds_batches,
108        parse_date_bounds,
109    )
110    from ._register import register
111    from ._attributes import (
112        attributes,
113        parameters,
114        columns,
115        indices,
116        indexes,
117        dtypes,
118        autoincrement,
119        autotime,
120        upsert,
121        static,
122        tzinfo,
123        enforce,
124        null_indices,
125        mixed_numerics,
126        get_columns,
127        get_columns_types,
128        get_columns_indices,
129        get_indices,
130        get_parameters,
131        get_dtypes,
132        update_parameters,
133        tags,
134        get_id,
135        id,
136        get_val_column,
137        parents,
138        parent,
139        children,
140        child,
141        reference,
142        references,
143        target,
144        _target_legacy,
145        guess_datetime,
146        precision,
147        get_precision,
148    )
149    from ._cache import (
150        _get_cache_connector,
151        _cache_value,
152        _get_cached_value,
153        _invalidate_cache,
154        _get_cache_dir_path,
155        _write_cache_key,
156        _write_cache_file,
157        _write_cache_conn_key,
158        _read_cache_key,
159        _read_cache_file,
160        _read_cache_conn_key,
161        _load_cache_keys,
162        _load_cache_files,
163        _load_cache_conn_keys,
164        _get_cache_keys,
165        _get_cache_file_keys,
166        _get_cache_conn_keys,
167        _clear_cache_key,
168        _clear_cache_file,
169        _clear_cache_conn_key,
170    )
171    from ._show import show
172    from ._edit import edit, edit_definition, update
173    from ._sync import (
174        sync,
175        get_sync_time,
176        exists,
177        filter_existing,
178        _get_chunk_label,
179        get_num_workers,
180        _persist_new_special_columns,
181    )
182    from ._verify import (
183        verify,
184        get_bound_interval,
185        get_bound_time,
186    )
187    from ._delete import delete
188    from ._drop import drop, drop_indices
189    from ._compress import compress, decompress
190    from ._maintenance import vacuum, analyze, repartition
191    from ._index import create_indices
192    from ._clear import clear
193    from ._deduplicate import deduplicate
194    from ._bootstrap import bootstrap
195    from ._dtypes import enforce_dtypes, infer_dtypes
196    from ._copy import copy_to
197
198    def __init__(
199        self,
200        connector: str = '',
201        metric: str = '',
202        location: Optional[str] = None,
203        parameters: Optional[Dict[str, Any]] = None,
204        columns: Union[Dict[str, str], List[str], None] = None,
205        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
206        tags: Optional[List[str]] = None,
207        target: Optional[str] = None,
208        dtypes: Optional[Dict[str, str]] = None,
209        instance: Optional[Union[str, InstanceConnector]] = None,
210        upsert: Optional[bool] = None,
211        autoincrement: Optional[bool] = None,
212        autotime: Optional[bool] = None,
213        precision: Union[str, Dict[str, Union[str, int]], None] = None,
214        static: Optional[bool] = None,
215        enforce: Optional[bool] = None,
216        null_indices: Optional[bool] = None,
217        mixed_numerics: Optional[bool] = None,
218        compress: Union[bool, Dict[str, Any], None] = None,
219        temporary: bool = False,
220        cache: Optional[bool] = None,
221        cache_connector_keys: Optional[str] = None,
222        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
223        reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
224        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
225        parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
226        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
227        child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
228        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
229        connector_keys: Optional[str] = None,
230        metric_key: Optional[str] = None,
231        location_key: Optional[str] = None,
232        instance_keys: Optional[str] = None,
233        indexes: Union[Dict[str, str], List[str], None] = None,
234        debug: bool = False,
235    ):
236        """
237        Parameters
238        ----------
239        connector: str
240            Keys for the pipe's source connector, e.g. `'sql:main'`.
241
242        metric: str
243            Label for the pipe's contents, e.g. `'weather'`.
244
245        location: str, default None
246            Label for the pipe's location. Defaults to `None`.
247
248        parameters: Optional[Dict[str, Any]], default None
249            Optionally set a pipe's parameters from the constructor,
250            e.g. columns and other attributes.
251            You can edit these parameters with `edit pipes`.
252
253        columns: Union[Dict[str, str], List[str], None], default None
254            Set the `columns` dictionary of `parameters`.
255            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
256
257        indices: Optional[Dict[str, Union[str, List[str]]]], default None
258            Set the `indices` dictionary of `parameters`.
259            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
260
261        tags: Optional[List[str]], default None
262            A list of strings to be added under the `'tags'` key of `parameters`.
263            You can select pipes with certain tags using `--tags`.
264
265        dtypes: Optional[Dict[str, str]], default None
266            Set the `dtypes` dictionary of `parameters`.
267            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
268
269        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
270            Connector for the Meerschaum instance where the pipe resides.
271            Defaults to the preconfigured default instance (`'sql:main'`).
272
273        instance: Optional[Union[str, InstanceConnector]], default None
274            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
275
276        upsert: Optional[bool], default None
277            If `True`, set `upsert` to `True` in the parameters.
278
279        autoincrement: Optional[bool], default None
280            If `True`, set `autoincrement` in the parameters.
281
282        autotime: Optional[bool], default None
283            If `True`, set `autotime` in the parameters.
284
285        precision: Union[str, Dict[str, Union[str, int]], None], default None
286            If provided, set `precision` in the parameters.
287            This may be either a string (the precision unit) or a dictionary of in the form
288            `{'unit': <unit>, 'interval': <interval>}`.
289            Default is determined by the `datetime` column dtype
290            (e.g. `datetime64[us]` is `microsecond` precision).
291
292        static: Optional[bool], default None
293            If `True`, set `static` in the parameters.
294
295        enforce: Optional[bool], default None
296            If `False`, skip data type enforcement.
297            Default behavior is `True`.
298
299        null_indices: Optional[bool], default None
300            Set to `False` if there will be no null values in the index columns.
301            Defaults to `True`.
302
303        mixed_numerics: bool, default None
304            If `True`, integer columns will be converted to `numeric` when floats are synced.
305            Set to `False` to disable this behavior.
306            Defaults to `True`.
307
308        compress: Union[bool, Dict[str, Any], None], default None
309            If `True` (or a dictionary of compression settings), mark the pipe for compression.
310            For TimescaleDB hypertables, a columnstore (compression) policy is installed
311            automatically on sync. A dictionary may override `segmentby`, `orderby`, and `after`.
312            Defaults to `False`.
313
314        hypercore: bool, default True
315            For TimescaleDB hypertables, enable the Hypercore columnstore at table creation
316            (declaring `segmentby`/`orderby` in `CREATE TABLE`), which causes TimescaleDB to
317            auto-create a columnstore policy. Set to `False` for a plain row-store hypertable.
318            Has no effect unless the pipe is a hypertable (`hypertable`, default `True`).
319
320        temporary: bool, default False
321            If `True`, prevent instance tables (pipes, users, plugins) from being created.
322
323        cache: Optional[bool], default None
324            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
325            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
326            Defaults to `True` (from `None`).
327
328        cache_connector_keys: Optional[str], default None
329            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
330
331        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
332            If provided, inherit the parameters of the reference Pipe(s).
333            May be equal to a string of the Pipe constructor, a dictionary of constructor keys,
334            a Pipe itself, or a list of any of these values.
335
336        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
337            Set references for parent pipes. See `references` for values.
338
339        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
340            Set references for child pipes. See `references` for values.
341
342        """
343        from meerschaum.utils.warnings import error, warn
344        if (not connector and not connector_keys) or (not metric and not metric_key):
345            error(
346                "Please provide strings for the connector and metric\n    "
347                + "(first two positional arguments)."
348            )
349
350        ### Fall back to legacy `location_key` just in case.
351        if not location:
352            location = location_key
353
354        if not connector:
355            connector = connector_keys
356
357        if not metric:
358            metric = metric_key
359
360        if location in ('[None]', 'None'):
361            location = None
362
363        from meerschaum._internal.static import STATIC_CONFIG
364        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
365        for k in (connector, metric, location, *(tags or [])):
366            if str(k).startswith(negation_prefix):
367                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
368
369        self._connector_keys = str(connector)
370        self._connector_key = self.connector_keys ### Alias
371        self._metric_key = metric
372        self._location_key = location
373        self.temporary = temporary
374        self.cache = (
375            cache
376            if cache is not None
377            else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False))
378        )
379        self.cache_connector_keys = (
380            str(cache_connector_keys)
381            if cache_connector_keys is not None
382            else None
383        )
384        self.debug = debug
385
386        self._attributes: Dict[str, Any] = {
387            'connector_keys': self._connector_keys,
388            'metric_key': self._metric_key,
389            'location_key': self._location_key,
390            'parameters': {},
391        }
392
393        ### only set parameters if values are provided
394        if isinstance(parameters, dict):
395            self._attributes['parameters'] = parameters
396        else:
397            if parameters is not None:
398                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
399            self._attributes['parameters'] = {}
400
401        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
402        if isinstance(columns, (list, tuple)):
403            columns = {str(col): str(col) for col in columns}
404        if isinstance(columns, dict):
405            self._attributes['parameters']['columns'] = columns
406        elif isinstance(columns, str) and 'Pipe(' in columns:
407            pass
408        elif columns is not None:
409            warn(f"The provided columns are of invalid type '{type(columns)}'.")
410
411        indices = (
412            indices
413            or indexes
414            or self._attributes.get('parameters', {}).get('indices', None)
415            or self._attributes.get('parameters', {}).get('indexes', None)
416        )
417        if isinstance(indices, dict):
418            indices_key = (
419                'indexes'
420                if 'indexes' in self._attributes['parameters']
421                else 'indices'
422            )
423            self._attributes['parameters'][indices_key] = indices
424
425        if isinstance(tags, (list, tuple)):
426            self._attributes['parameters']['tags'] = tags
427        elif tags is not None:
428            warn(f"The provided tags are of invalid type '{type(tags)}'.")
429
430        if isinstance(target, str):
431            self._attributes['parameters']['target'] = target
432        elif target is not None:
433            warn(f"The provided target is of invalid type '{type(target)}'.")
434
435        if isinstance(dtypes, dict):
436            self._attributes['parameters']['dtypes'] = dtypes
437        elif dtypes is not None:
438            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
439
440        if isinstance(upsert, bool):
441            self._attributes['parameters']['upsert'] = upsert
442
443        if isinstance(autoincrement, bool):
444            self._attributes['parameters']['autoincrement'] = autoincrement
445
446        if isinstance(autotime, bool):
447            self._attributes['parameters']['autotime'] = autotime
448
449        if isinstance(precision, dict):
450            self._attributes['parameters']['precision'] = precision
451        elif isinstance(precision, str):
452            self._attributes['parameters']['precision'] = {'unit': precision}
453
454        if isinstance(static, bool):
455            self._attributes['parameters']['static'] = static
456            self._static = static
457
458        if isinstance(enforce, bool):
459            self._attributes['parameters']['enforce'] = enforce
460
461        if isinstance(null_indices, bool):
462            self._attributes['parameters']['null_indices'] = null_indices
463
464        if isinstance(mixed_numerics, bool):
465            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
466
467        if isinstance(compress, (bool, dict)):
468            self._attributes['parameters']['compress'] = compress
469
470        ### NOTE: The parameters dictionary is {} by default.
471        ###       A Pipe may be registered without parameters, then edited,
472        ###       or a Pipe may be registered with parameters set in-memory first.
473        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
474        if _mrsm_instance is None:
475            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
476
477        if not isinstance(_mrsm_instance, str):
478            self._instance_connector = _mrsm_instance
479            self._instance_keys = str(_mrsm_instance)
480        else:
481            self._instance_keys = _mrsm_instance
482
483        if self._instance_keys == 'sql:memory':
484            self.cache = False
485
486        self._cache_locks = collections.defaultdict(lambda: threading.RLock())
487
488        if references is not None or reference is not None:
489            reference_vals = references if references is not None else reference
490            self.references = reference_vals
491
492        if parents is not None or parent is not None:
493            parent_vals = parents if parents is not None else parent
494            self.parents = parent_vals
495
496        if children is not None or child is not None:
497            children_vals = children if children is not None else child
498            self.children = children_vals
499
500    @property
501    def metric_key(self) -> str:
502        """
503        Return the pipe's metric key.
504        """
505        return self._metric_key
506
507    @property
508    def metric(self) -> str:
509        """
510        Return the pipe's metric key.
511        """
512        return self._metric_key
513
514    @property
515    def location_key(self) -> Union[str, None]:
516        """
517        Return the pipe's location key.
518        """
519        return self._location_key
520
521    @property
522    def location(self) -> Union[str, None]:
523        """
524        Return the pipe's location key.
525        """
526        return self._location_key
527
528    @property
529    def meta(self):
530        """
531        Return the four keys needed to reconstruct this pipe.
532        """
533        return {
534            'connector_keys': self.connector_keys,
535            'metric_key': self.metric_key,
536            'location_key': self.location_key,
537            'instance_keys': self.instance_keys,
538        }
539
540    def keys(self) -> List[str]:
541        """
542        Return the ordered keys for this pipe.
543        """
544        return {
545            key: val
546            for key, val in self.meta.items()
547            if key != 'instance'
548        }
549
550    @property
551    def instance_keys(self) -> str:
552        """
553        Return the pipe's instance keys.
554        """
555        return self._instance_keys
556
557    @property
558    def instance(self) -> Union[InstanceConnector, str]:
559        """
560        Return the pipe's instance connector or keys.
561        """
562        conn = self.instance_connector
563        if conn is None:
564            return self.instance_keys
565        return conn
566
567    @property
568    def instance_connector(self) -> Union[InstanceConnector, None]:
569        """
570        The instance connector on which this pipe resides.
571        """
572        if '_instance_connector' not in self.__dict__:
573            from meerschaum.connectors.parse import parse_instance_keys
574            conn = parse_instance_keys(self.instance_keys)
575            if conn:
576                self._instance_connector = conn
577            else:
578                return None
579        return self._instance_connector
580
581    @property
582    def connector_keys(self) -> str:
583        """
584        Return the pipe's connector keys.
585        """
586        return self._connector_keys
587
588    @property
589    def connector_key(self) -> str:
590        """
591        Legacy: use `Pipe.connector_keys` instead.
592        """
593        return self.connector_keys
594
595    @property
596    def connector(self) -> Union['Connector', str]:
597        """
598        The connector to the data source.
599        """
600        if '_connector' not in self.__dict__:
601            from meerschaum.connectors.parse import parse_instance_keys
602            import warnings
603            with warnings.catch_warnings():
604                warnings.simplefilter('ignore')
605                try:
606                    conn = parse_instance_keys(self.connector_keys)
607                except Exception:
608                    conn = None
609            if conn:
610                self._connector = conn
611            else:
612                return self._connector_keys
613        return self._connector
614
615    def __str__(self, ansi: bool=False):
616        return pipe_repr(self, ansi=ansi)
617
618    def __eq__(self, other):
619        try:
620            return (
621                isinstance(self, type(other))
622                and self.connector_keys == other.connector_keys
623                and self.metric_key == other.metric_key
624                and self.location_key == other.location_key
625                and self.instance_keys == other.instance_keys
626            )
627        except Exception:
628            return False
629
630    def __hash__(self):
631        ### Using an esoteric separator to avoid collisions.
632        sep = "[\"']"
633        return hash(
634            str(self.connector_keys) + sep
635            + str(self.metric_key) + sep
636            + str(self.location_key) + sep
637            + str(self.instance_keys) + sep
638        )
639
640    def __repr__(self, ansi: bool=True, **kw) -> str:
641        if not hasattr(sys, 'ps1'):
642            ansi = False
643
644        return pipe_repr(self, ansi=ansi, **kw)
645
646    def __pt_repr__(self):
647        from meerschaum.utils.packages import attempt_import
648        prompt_toolkit_formatted_text = attempt_import('prompt_toolkit.formatted_text', lazy=False)
649        return prompt_toolkit_formatted_text.ANSI(pipe_repr(self, ansi=True))
650
651    def __getstate__(self) -> Dict[str, Any]:
652        """
653        Define the state dictionary (pickling).
654        """
655        return {
656            'connector_keys': self.connector_keys,
657            'metric_key': self.metric_key,
658            'location_key': self.location_key,
659            'parameters': self._attributes.get('parameters', None),
660            'instance_keys': self.instance_keys,
661        }
662
663    def __setstate__(self, _state: Dict[str, Any]):
664        """
665        Read the state (unpickling).
666        """
667        self.__init__(**_state)
668
669    def __getitem__(self, key: str) -> Any:
670        """
671        Index the pipe's attributes.
672        If the `key` cannot be found`, return `None`.
673        """
674        if key in self.attributes:
675            return self.attributes.get(key, None)
676
677        aliases = {
678            'connector': 'connector_keys',
679            'connector_key': 'connector_keys',
680            'metric': 'metric_key',
681            'location': 'location_key',
682        }
683        aliased_key = aliases.get(key, None)
684        if aliased_key is not None:
685            return self.attributes.get(aliased_key, None)
686
687        property_aliases = {
688            'instance': 'instance_keys',
689            'instance_key': 'instance_keys',
690        }
691        aliased_key = property_aliases.get(key, None)
692        if aliased_key is not None:
693            key = aliased_key
694        return getattr(self, key, None)
695
696    def __copy__(self):
697        """
698        Return a shallow copy of the current pipe.
699        """
700        return mrsm.Pipe(
701            self.connector_keys, self.metric_key, self.location_key,
702            instance=self.instance_keys,
703            parameters=self._attributes.get('parameters', None),
704        )
705
706    def __deepcopy__(self, memo):
707        """
708        Return a deep copy of the current pipe.
709        """
710        return self.__copy__()

Access Meerschaum pipes via Pipe objects.

Pipes are identified by the following:

  1. Connector keys (e.g. 'sql:main')
  2. Metric key (e.g. 'weather')
  3. Location (optional; e.g. None)

A pipe's connector keys correspond to a data source, and when the pipe is synced, its fetch definition is evaluated and executed to produce new data.

Alternatively, new data may be directly synced via pipe.sync():

>>> from meerschaum import Pipe
>>> pipe = Pipe('csv', 'weather')
>>>
>>> import pandas as pd
>>> df = pd.read_csv('weather.csv')
>>> pipe.sync(df)
Pipe( connector: str = '', metric: str = '', location: Optional[str] = None, parameters: Optional[Dict[str, Any]] = None, columns: Union[Dict[str, str], List[str], NoneType] = None, indices: Optional[Dict[str, Union[str, List[str]]]] = None, tags: Optional[List[str]] = None, target: Optional[str] = None, dtypes: Optional[Dict[str, str]] = None, instance: Union[str, InstanceConnector, NoneType] = None, upsert: Optional[bool] = None, autoincrement: Optional[bool] = None, autotime: Optional[bool] = None, precision: Union[str, Dict[str, Union[str, int]], NoneType] = None, static: Optional[bool] = None, enforce: Optional[bool] = None, null_indices: Optional[bool] = None, mixed_numerics: Optional[bool] = None, compress: Union[bool, Dict[str, Any], NoneType] = None, temporary: bool = False, cache: Optional[bool] = None, cache_connector_keys: Optional[str] = None, references: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, reference: Union[str, Dict[str, Any], Pipe, NoneType] = None, parents: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, parent: Union[str, Dict[str, Any], Pipe, NoneType] = None, children: Optional[List[Union[str, Dict[str, Any], Pipe, NoneType]]] = None, child: Union[str, Dict[str, Any], Pipe, NoneType] = None, mrsm_instance: Union[str, InstanceConnector, NoneType] = None, connector_keys: Optional[str] = None, metric_key: Optional[str] = None, location_key: Optional[str] = None, instance_keys: Optional[str] = None, indexes: Union[Dict[str, str], List[str], NoneType] = None, debug: bool = False)
198    def __init__(
199        self,
200        connector: str = '',
201        metric: str = '',
202        location: Optional[str] = None,
203        parameters: Optional[Dict[str, Any]] = None,
204        columns: Union[Dict[str, str], List[str], None] = None,
205        indices: Optional[Dict[str, Union[str, List[str]]]] = None,
206        tags: Optional[List[str]] = None,
207        target: Optional[str] = None,
208        dtypes: Optional[Dict[str, str]] = None,
209        instance: Optional[Union[str, InstanceConnector]] = None,
210        upsert: Optional[bool] = None,
211        autoincrement: Optional[bool] = None,
212        autotime: Optional[bool] = None,
213        precision: Union[str, Dict[str, Union[str, int]], None] = None,
214        static: Optional[bool] = None,
215        enforce: Optional[bool] = None,
216        null_indices: Optional[bool] = None,
217        mixed_numerics: Optional[bool] = None,
218        compress: Union[bool, Dict[str, Any], None] = None,
219        temporary: bool = False,
220        cache: Optional[bool] = None,
221        cache_connector_keys: Optional[str] = None,
222        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
223        reference: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
224        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
225        parent: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
226        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]] = None,
227        child: Union[str, Dict[str, Any], mrsm.Pipe, None] = None,
228        mrsm_instance: Optional[Union[str, InstanceConnector]] = None,
229        connector_keys: Optional[str] = None,
230        metric_key: Optional[str] = None,
231        location_key: Optional[str] = None,
232        instance_keys: Optional[str] = None,
233        indexes: Union[Dict[str, str], List[str], None] = None,
234        debug: bool = False,
235    ):
236        """
237        Parameters
238        ----------
239        connector: str
240            Keys for the pipe's source connector, e.g. `'sql:main'`.
241
242        metric: str
243            Label for the pipe's contents, e.g. `'weather'`.
244
245        location: str, default None
246            Label for the pipe's location. Defaults to `None`.
247
248        parameters: Optional[Dict[str, Any]], default None
249            Optionally set a pipe's parameters from the constructor,
250            e.g. columns and other attributes.
251            You can edit these parameters with `edit pipes`.
252
253        columns: Union[Dict[str, str], List[str], None], default None
254            Set the `columns` dictionary of `parameters`.
255            If `parameters` is also provided, this dictionary is added under the `'columns'` key.
256
257        indices: Optional[Dict[str, Union[str, List[str]]]], default None
258            Set the `indices` dictionary of `parameters`.
259            If `parameters` is also provided, this dictionary is added under the `'indices'` key.
260
261        tags: Optional[List[str]], default None
262            A list of strings to be added under the `'tags'` key of `parameters`.
263            You can select pipes with certain tags using `--tags`.
264
265        dtypes: Optional[Dict[str, str]], default None
266            Set the `dtypes` dictionary of `parameters`.
267            If `parameters` is also provided, this dictionary is added under the `'dtypes'` key.
268
269        mrsm_instance: Optional[Union[str, InstanceConnector]], default None
270            Connector for the Meerschaum instance where the pipe resides.
271            Defaults to the preconfigured default instance (`'sql:main'`).
272
273        instance: Optional[Union[str, InstanceConnector]], default None
274            Alias for `mrsm_instance`. If `mrsm_instance` is supplied, this value is ignored.
275
276        upsert: Optional[bool], default None
277            If `True`, set `upsert` to `True` in the parameters.
278
279        autoincrement: Optional[bool], default None
280            If `True`, set `autoincrement` in the parameters.
281
282        autotime: Optional[bool], default None
283            If `True`, set `autotime` in the parameters.
284
285        precision: Union[str, Dict[str, Union[str, int]], None], default None
286            If provided, set `precision` in the parameters.
287            This may be either a string (the precision unit) or a dictionary of in the form
288            `{'unit': <unit>, 'interval': <interval>}`.
289            Default is determined by the `datetime` column dtype
290            (e.g. `datetime64[us]` is `microsecond` precision).
291
292        static: Optional[bool], default None
293            If `True`, set `static` in the parameters.
294
295        enforce: Optional[bool], default None
296            If `False`, skip data type enforcement.
297            Default behavior is `True`.
298
299        null_indices: Optional[bool], default None
300            Set to `False` if there will be no null values in the index columns.
301            Defaults to `True`.
302
303        mixed_numerics: bool, default None
304            If `True`, integer columns will be converted to `numeric` when floats are synced.
305            Set to `False` to disable this behavior.
306            Defaults to `True`.
307
308        compress: Union[bool, Dict[str, Any], None], default None
309            If `True` (or a dictionary of compression settings), mark the pipe for compression.
310            For TimescaleDB hypertables, a columnstore (compression) policy is installed
311            automatically on sync. A dictionary may override `segmentby`, `orderby`, and `after`.
312            Defaults to `False`.
313
314        hypercore: bool, default True
315            For TimescaleDB hypertables, enable the Hypercore columnstore at table creation
316            (declaring `segmentby`/`orderby` in `CREATE TABLE`), which causes TimescaleDB to
317            auto-create a columnstore policy. Set to `False` for a plain row-store hypertable.
318            Has no effect unless the pipe is a hypertable (`hypertable`, default `True`).
319
320        temporary: bool, default False
321            If `True`, prevent instance tables (pipes, users, plugins) from being created.
322
323        cache: Optional[bool], default None
324            If `True`, cache the pipe's metadata to disk (in addition to in-memory caching).
325            If `cache` is not explicitly `True`, it is set to `False` if `temporary` is `True`.
326            Defaults to `True` (from `None`).
327
328        cache_connector_keys: Optional[str], default None
329            If provided, use the keys to a Valkey connector (e.g. `valkey:main`).
330
331        references: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
332            If provided, inherit the parameters of the reference Pipe(s).
333            May be equal to a string of the Pipe constructor, a dictionary of constructor keys,
334            a Pipe itself, or a list of any of these values.
335
336        parents: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
337            Set references for parent pipes. See `references` for values.
338
339        children: Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None
340            Set references for child pipes. See `references` for values.
341
342        """
343        from meerschaum.utils.warnings import error, warn
344        if (not connector and not connector_keys) or (not metric and not metric_key):
345            error(
346                "Please provide strings for the connector and metric\n    "
347                + "(first two positional arguments)."
348            )
349
350        ### Fall back to legacy `location_key` just in case.
351        if not location:
352            location = location_key
353
354        if not connector:
355            connector = connector_keys
356
357        if not metric:
358            metric = metric_key
359
360        if location in ('[None]', 'None'):
361            location = None
362
363        from meerschaum._internal.static import STATIC_CONFIG
364        negation_prefix = STATIC_CONFIG['system']['fetch_pipes_keys']['negation_prefix']
365        for k in (connector, metric, location, *(tags or [])):
366            if str(k).startswith(negation_prefix):
367                error(f"A pipe's keys and tags cannot start with the prefix '{negation_prefix}'.")
368
369        self._connector_keys = str(connector)
370        self._connector_key = self.connector_keys ### Alias
371        self._metric_key = metric
372        self._location_key = location
373        self.temporary = temporary
374        self.cache = (
375            cache
376            if cache is not None
377            else ((not temporary) and get_config('pipes', 'cache', 'enabled', warn=False))
378        )
379        self.cache_connector_keys = (
380            str(cache_connector_keys)
381            if cache_connector_keys is not None
382            else None
383        )
384        self.debug = debug
385
386        self._attributes: Dict[str, Any] = {
387            'connector_keys': self._connector_keys,
388            'metric_key': self._metric_key,
389            'location_key': self._location_key,
390            'parameters': {},
391        }
392
393        ### only set parameters if values are provided
394        if isinstance(parameters, dict):
395            self._attributes['parameters'] = parameters
396        else:
397            if parameters is not None:
398                warn(f"The provided parameters are of invalid type '{type(parameters)}'.")
399            self._attributes['parameters'] = {}
400
401        columns = columns or self._attributes.get('parameters', {}).get('columns', None)
402        if isinstance(columns, (list, tuple)):
403            columns = {str(col): str(col) for col in columns}
404        if isinstance(columns, dict):
405            self._attributes['parameters']['columns'] = columns
406        elif isinstance(columns, str) and 'Pipe(' in columns:
407            pass
408        elif columns is not None:
409            warn(f"The provided columns are of invalid type '{type(columns)}'.")
410
411        indices = (
412            indices
413            or indexes
414            or self._attributes.get('parameters', {}).get('indices', None)
415            or self._attributes.get('parameters', {}).get('indexes', None)
416        )
417        if isinstance(indices, dict):
418            indices_key = (
419                'indexes'
420                if 'indexes' in self._attributes['parameters']
421                else 'indices'
422            )
423            self._attributes['parameters'][indices_key] = indices
424
425        if isinstance(tags, (list, tuple)):
426            self._attributes['parameters']['tags'] = tags
427        elif tags is not None:
428            warn(f"The provided tags are of invalid type '{type(tags)}'.")
429
430        if isinstance(target, str):
431            self._attributes['parameters']['target'] = target
432        elif target is not None:
433            warn(f"The provided target is of invalid type '{type(target)}'.")
434
435        if isinstance(dtypes, dict):
436            self._attributes['parameters']['dtypes'] = dtypes
437        elif dtypes is not None:
438            warn(f"The provided dtypes are of invalid type '{type(dtypes)}'.")
439
440        if isinstance(upsert, bool):
441            self._attributes['parameters']['upsert'] = upsert
442
443        if isinstance(autoincrement, bool):
444            self._attributes['parameters']['autoincrement'] = autoincrement
445
446        if isinstance(autotime, bool):
447            self._attributes['parameters']['autotime'] = autotime
448
449        if isinstance(precision, dict):
450            self._attributes['parameters']['precision'] = precision
451        elif isinstance(precision, str):
452            self._attributes['parameters']['precision'] = {'unit': precision}
453
454        if isinstance(static, bool):
455            self._attributes['parameters']['static'] = static
456            self._static = static
457
458        if isinstance(enforce, bool):
459            self._attributes['parameters']['enforce'] = enforce
460
461        if isinstance(null_indices, bool):
462            self._attributes['parameters']['null_indices'] = null_indices
463
464        if isinstance(mixed_numerics, bool):
465            self._attributes['parameters']['mixed_numerics'] = mixed_numerics
466
467        if isinstance(compress, (bool, dict)):
468            self._attributes['parameters']['compress'] = compress
469
470        ### NOTE: The parameters dictionary is {} by default.
471        ###       A Pipe may be registered without parameters, then edited,
472        ###       or a Pipe may be registered with parameters set in-memory first.
473        _mrsm_instance = mrsm_instance if mrsm_instance is not None else (instance or instance_keys)
474        if _mrsm_instance is None:
475            _mrsm_instance = get_config('meerschaum', 'instance', patch=True)
476
477        if not isinstance(_mrsm_instance, str):
478            self._instance_connector = _mrsm_instance
479            self._instance_keys = str(_mrsm_instance)
480        else:
481            self._instance_keys = _mrsm_instance
482
483        if self._instance_keys == 'sql:memory':
484            self.cache = False
485
486        self._cache_locks = collections.defaultdict(lambda: threading.RLock())
487
488        if references is not None or reference is not None:
489            reference_vals = references if references is not None else reference
490            self.references = reference_vals
491
492        if parents is not None or parent is not None:
493            parent_vals = parents if parents is not None else parent
494            self.parents = parent_vals
495
496        if children is not None or child is not None:
497            children_vals = children if children is not None else child
498            self.children = children_vals
Parameters
  • connector (str): Keys for the pipe's source connector, e.g. 'sql:main'.
  • metric (str): Label for the pipe's contents, e.g. 'weather'.
  • location (str, default None): Label for the pipe's location. Defaults to None.
  • parameters (Optional[Dict[str, Any]], default None): Optionally set a pipe's parameters from the constructor, e.g. columns and other attributes. You can edit these parameters with edit pipes.
  • columns (Union[Dict[str, str], List[str], None], default None): Set the columns dictionary of parameters. If parameters is also provided, this dictionary is added under the 'columns' key.
  • indices (Optional[Dict[str, Union[str, List[str]]]], default None): Set the indices dictionary of parameters. If parameters is also provided, this dictionary is added under the 'indices' key.
  • tags (Optional[List[str]], default None): A list of strings to be added under the 'tags' key of parameters. You can select pipes with certain tags using --tags.
  • dtypes (Optional[Dict[str, str]], default None): Set the dtypes dictionary of parameters. If parameters is also provided, this dictionary is added under the 'dtypes' key.
  • mrsm_instance (Optional[Union[str, InstanceConnector]], default None): Connector for the Meerschaum instance where the pipe resides. Defaults to the preconfigured default instance ('sql:main').
  • instance (Optional[Union[str, InstanceConnector]], default None): Alias for mrsm_instance. If mrsm_instance is supplied, this value is ignored.
  • upsert (Optional[bool], default None): If True, set upsert to True in the parameters.
  • autoincrement (Optional[bool], default None): If True, set autoincrement in the parameters.
  • autotime (Optional[bool], default None): If True, set autotime in the parameters.
  • precision (Union[str, Dict[str, Union[str, int]], None], default None): If provided, set precision in the parameters. This may be either a string (the precision unit) or a dictionary of in the form {'unit': <unit>, 'interval': <interval>}. Default is determined by the datetime column dtype (e.g. datetime64[us] is microsecond precision).
  • static (Optional[bool], default None): If True, set static in the parameters.
  • enforce (Optional[bool], default None): If False, skip data type enforcement. Default behavior is True.
  • null_indices (Optional[bool], default None): Set to False if there will be no null values in the index columns. Defaults to True.
  • mixed_numerics (bool, default None): If True, integer columns will be converted to numeric when floats are synced. Set to False to disable this behavior. Defaults to True.
  • compress (Union[bool, Dict[str, Any], None], default None): If True (or a dictionary of compression settings), mark the pipe for compression. For TimescaleDB hypertables, a columnstore (compression) policy is installed automatically on sync. A dictionary may override segmentby, orderby, and after. Defaults to False.
  • hypercore (bool, default True): For TimescaleDB hypertables, enable the Hypercore columnstore at table creation (declaring segmentby/orderby in CREATE TABLE), which causes TimescaleDB to auto-create a columnstore policy. Set to False for a plain row-store hypertable. Has no effect unless the pipe is a hypertable (hypertable, default True).
  • temporary (bool, default False): If True, prevent instance tables (pipes, users, plugins) from being created.
  • cache (Optional[bool], default None): If True, cache the pipe's metadata to disk (in addition to in-memory caching). If cache is not explicitly True, it is set to False if temporary is True. Defaults to True (from None).
  • cache_connector_keys (Optional[str], default None): If provided, use the keys to a Valkey connector (e.g. valkey:main).
  • references (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): If provided, inherit the parameters of the reference Pipe(s). May be equal to a string of the Pipe constructor, a dictionary of constructor keys, a Pipe itself, or a list of any of these values.
  • parents (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): Set references for parent pipes. See references for values.
  • children (Optional[List[Union[str, Dict[str, Any], mrsm.Pipe, None]]], default None): Set references for child pipes. See references for values.
temporary
cache
cache_connector_keys
debug
metric_key: str
500    @property
501    def metric_key(self) -> str:
502        """
503        Return the pipe's metric key.
504        """
505        return self._metric_key

Return the pipe's metric key.

metric: str
507    @property
508    def metric(self) -> str:
509        """
510        Return the pipe's metric key.
511        """
512        return self._metric_key

Return the pipe's metric key.

location_key: Optional[str]
514    @property
515    def location_key(self) -> Union[str, None]:
516        """
517        Return the pipe's location key.
518        """
519        return self._location_key

Return the pipe's location key.

location: Optional[str]
521    @property
522    def location(self) -> Union[str, None]:
523        """
524        Return the pipe's location key.
525        """
526        return self._location_key

Return the pipe's location key.

meta
528    @property
529    def meta(self):
530        """
531        Return the four keys needed to reconstruct this pipe.
532        """
533        return {
534            'connector_keys': self.connector_keys,
535            'metric_key': self.metric_key,
536            'location_key': self.location_key,
537            'instance_keys': self.instance_keys,
538        }

Return the four keys needed to reconstruct this pipe.

def keys(self) -> List[str]:
540    def keys(self) -> List[str]:
541        """
542        Return the ordered keys for this pipe.
543        """
544        return {
545            key: val
546            for key, val in self.meta.items()
547            if key != 'instance'
548        }

Return the ordered keys for this pipe.

instance_keys: str
550    @property
551    def instance_keys(self) -> str:
552        """
553        Return the pipe's instance keys.
554        """
555        return self._instance_keys

Return the pipe's instance keys.

instance: Union[InstanceConnector, str]
557    @property
558    def instance(self) -> Union[InstanceConnector, str]:
559        """
560        Return the pipe's instance connector or keys.
561        """
562        conn = self.instance_connector
563        if conn is None:
564            return self.instance_keys
565        return conn

Return the pipe's instance connector or keys.

instance_connector: Optional[InstanceConnector]
567    @property
568    def instance_connector(self) -> Union[InstanceConnector, None]:
569        """
570        The instance connector on which this pipe resides.
571        """
572        if '_instance_connector' not in self.__dict__:
573            from meerschaum.connectors.parse import parse_instance_keys
574            conn = parse_instance_keys(self.instance_keys)
575            if conn:
576                self._instance_connector = conn
577            else:
578                return None
579        return self._instance_connector

The instance connector on which this pipe resides.

connector_keys: str
581    @property
582    def connector_keys(self) -> str:
583        """
584        Return the pipe's connector keys.
585        """
586        return self._connector_keys

Return the pipe's connector keys.

connector_key: str
588    @property
589    def connector_key(self) -> str:
590        """
591        Legacy: use `Pipe.connector_keys` instead.
592        """
593        return self.connector_keys

Legacy: use Pipe.connector_keys instead.

connector: "Union['Connector', str]"
595    @property
596    def connector(self) -> Union['Connector', str]:
597        """
598        The connector to the data source.
599        """
600        if '_connector' not in self.__dict__:
601            from meerschaum.connectors.parse import parse_instance_keys
602            import warnings
603            with warnings.catch_warnings():
604                warnings.simplefilter('ignore')
605                try:
606                    conn = parse_instance_keys(self.connector_keys)
607                except Exception:
608                    conn = None
609            if conn:
610                self._connector = conn
611            else:
612                return self._connector_keys
613        return self._connector

The connector to the data source.

def fetch( self, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, check_existing: bool = True, sync_chunks: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.DataFrame, Iterator[pandas.DataFrame], NoneType]:
21def fetch(
22    self,
23    begin: Union[datetime, int, str, None] = '',
24    end: Union[datetime, int, None] = None,
25    check_existing: bool = True,
26    sync_chunks: bool = False,
27    debug: bool = False,
28    **kw: Any
29) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
30    """
31    Fetch a Pipe's latest data from its connector.
32
33    Parameters
34    ----------
35    begin: Union[datetime, str, None], default '':
36        If provided, only fetch data newer than or equal to `begin`.
37
38    end: Optional[datetime], default None:
39        If provided, only fetch data older than or equal to `end`.
40
41    check_existing: bool, default True
42        If `False`, do not apply the backtrack interval.
43
44    sync_chunks: bool, default False
45        If `True` and the pipe's connector is of type `'sql'`, begin syncing chunks while fetching
46        loads chunks into memory.
47
48    debug: bool, default False
49        Verbosity toggle.
50
51    Returns
52    -------
53    A `pd.DataFrame` of the newest unseen data.
54
55    """
56    if 'fetch' not in dir(self.connector):
57        warn(f"No `fetch()` function defined for connector '{self.connector}'")
58        return None
59
60    from meerschaum.connectors import get_connector_plugin
61    from meerschaum.utils.misc import filter_arguments
62
63    _chunk_hook = kw.pop('chunk_hook', None)
64    kw['workers'] = self.get_num_workers(kw.get('workers', None))
65    if sync_chunks and _chunk_hook is None:
66
67        def _chunk_hook(chunk, **_kw) -> SuccessTuple:
68            """
69            Wrap `Pipe.sync()` with a custom chunk label prepended to the message.
70            """
71            from meerschaum.config._patch import apply_patch_to_config
72            kwargs = apply_patch_to_config(kw, _kw)
73            chunk_success, chunk_message = self.sync(chunk, **kwargs)
74            chunk_label = self._get_chunk_label(chunk, self.columns.get('datetime', None))
75            if chunk_label:
76                chunk_message = '\n' + chunk_label + '\n' + chunk_message
77            return chunk_success, chunk_message
78
79    begin, end = self.parse_date_bounds(begin, end)
80
81    with mrsm.Venv(get_connector_plugin(self.connector)):
82        _args, _kwargs = filter_arguments(
83            self.connector.fetch,
84            self,
85            begin=_determine_begin(
86                self,
87                begin,
88                end,
89                check_existing=check_existing,
90                debug=debug,
91            ),
92            end=end,
93            chunk_hook=_chunk_hook,
94            debug=debug,
95            **kw
96        )
97        df = self.connector.fetch(*_args, **_kwargs)
98    return df

Fetch a Pipe's latest data from its connector.

Parameters
  • begin (Union[datetime, str, None], default '':): If provided, only fetch data newer than or equal to begin.
  • end (Optional[datetime], default None:): If provided, only fetch data older than or equal to end.
  • check_existing (bool, default True): If False, do not apply the backtrack interval.
  • sync_chunks (bool, default False): If True and the pipe's connector is of type 'sql', begin syncing chunks while fetching loads chunks into memory.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pd.DataFrame of the newest unseen data.
def get_backtrack_interval( self, check_existing: bool = True, debug: bool = False) -> Union[datetime.timedelta, int]:
101def get_backtrack_interval(
102    self,
103    check_existing: bool = True,
104    debug: bool = False,
105) -> Union[timedelta, int]:
106    """
107    Get the chunk interval to use for this pipe.
108
109    Parameters
110    ----------
111    check_existing: bool, default True
112        If `False`, return a backtrack_interval of 0 minutes.
113
114    Returns
115    -------
116    The backtrack interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
117    """
118    from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES
119    default_backtrack_minutes = get_config('pipes', 'parameters', 'fetch', 'backtrack_minutes')
120    configured_backtrack_minutes = self.parameters.get('fetch', {}).get('backtrack_minutes', None)
121    backtrack_minutes = (
122        configured_backtrack_minutes
123        if configured_backtrack_minutes is not None
124        else default_backtrack_minutes
125    ) if check_existing else 0
126
127    dt_col = self.columns.get('datetime', None)
128    if dt_col is None:
129        return timedelta(minutes=backtrack_minutes)
130
131    dt_dtype = self.dtypes.get(dt_col, 'datetime')
132    if 'int' in dt_dtype.lower():
133        if not self.parameters.get('precision', None):
134            return backtrack_minutes
135        precision_unit = self.precision.get('unit', None)
136        true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
137        scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None)
138        if scalar is not None:
139            return int(backtrack_minutes * 60 * scalar)
140        return backtrack_minutes
141
142    return timedelta(minutes=backtrack_minutes)

Get the chunk interval to use for this pipe.

Parameters
  • check_existing (bool, default True): If False, return a backtrack_interval of 0 minutes.
Returns
  • The backtrack interval (timedelta or int) to use with this pipe's datetime axis.
def get_data( self, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, str, NoneType] = None, end: Union[datetime.datetime, int, str, NoneType] = None, params: Optional[Dict[str, Any]] = None, as_docs: bool = False, as_iterator: bool = False, as_chunks: bool = False, as_dask: bool = False, add_missing_columns: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, order: Optional[str] = 'asc', limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Union[pandas.DataFrame, Iterator[pandas.DataFrame], NoneType]:
 23def get_data(
 24    self,
 25    select_columns: Optional[List[str]] = None,
 26    omit_columns: Optional[List[str]] = None,
 27    begin: Union[datetime, int, str, None] = None,
 28    end: Union[datetime, int, str, None] = None,
 29    params: Optional[Dict[str, Any]] = None,
 30    as_docs: bool = False,
 31    as_iterator: bool = False,
 32    as_chunks: bool = False,
 33    as_dask: bool = False,
 34    add_missing_columns: bool = False,
 35    chunk_interval: Union[timedelta, int, None] = None,
 36    order: Optional[str] = 'asc',
 37    limit: Optional[int] = None,
 38    fresh: bool = False,
 39    debug: bool = False,
 40    **kw: Any
 41) -> Union['pd.DataFrame', Iterator['pd.DataFrame'], None]:
 42    """
 43    Get a pipe's data from the instance connector.
 44
 45    Parameters
 46    ----------
 47    select_columns: Optional[List[str]], default None
 48        If provided, only select these given columns.
 49        Otherwise select all available columns (i.e. `SELECT *`).
 50
 51    omit_columns: Optional[List[str]], default None
 52        If provided, remove these columns from the selection.
 53
 54    begin: Union[datetime, int, str, None], default None
 55        Lower bound datetime to begin searching for data (inclusive).
 56        Translates to a `WHERE` clause like `WHERE datetime >= begin`.
 57        Defaults to `None`.
 58
 59    end: Union[datetime, int, str, None], default None
 60        Upper bound datetime to stop searching for data (inclusive).
 61        Translates to a `WHERE` clause like `WHERE datetime < end`.
 62        Defaults to `None`.
 63
 64    params: Optional[Dict[str, Any]], default None
 65        Filter the retrieved data by a dictionary of parameters.
 66        See `meerschaum.utils.sql.build_where` for more details. 
 67
 68    as_docs: bool, default False
 69        If `True`, return a list of dictionaries rather than a DataFrame.
 70        Relies on `get_pipe_docs` from the instance connector if implemented.
 71        May be combined with `as_chunks` to return an `Iterator[List[Dict]]`
 72        chunked by time bounds (useful for large result sets without pandas overhead).
 73
 74    as_iterator: bool, default False
 75        If `True`, return a generator of chunks of pipe data.
 76        When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames.
 77
 78    as_chunks: bool, default False
 79        Alias for `as_iterator`.
 80        When combined with `as_docs=True`, yields `List[Dict]` per chunk instead of DataFrames.
 81
 82    as_dask: bool, default False
 83        If `True`, return a `dask.DataFrame`
 84        (which may be loaded into a Pandas DataFrame with `df.compute()`).
 85
 86    add_missing_columns: bool, default False
 87        If `True`, add any missing columns from `Pipe.dtypes` to the dataframe.
 88
 89    chunk_interval: Union[timedelta, int, None], default None
 90        If `as_iterator`, then return chunks with `begin` and `end` separated by this interval.
 91        This may be set under `pipe.parameters['chunk_minutes']`.
 92        By default, use a timedelta of 43200 minutes (30 days).
 93        If `chunk_interval` is an integer and the `datetime` axis a timestamp,
 94        the use a timedelta with the number of minutes configured to this value.
 95        If the `datetime` axis is an integer, default to the configured chunksize.
 96        If `chunk_interval` is a `timedelta` and the `datetime` axis an integer,
 97        use the number of minutes in the `timedelta`.
 98
 99    order: Optional[str], default 'asc'
100        If `order` is not `None`, sort the resulting dataframe by indices.
101
102    limit: Optional[int], default None
103        If provided, cap the dataframe to this many rows.
104
105    fresh: bool, default False
106        If `True`, skip local cache and directly query the instance connector.
107
108    debug: bool, default False
109        Verbosity toggle.
110        Defaults to `False`.
111
112    Returns
113    -------
114    A `pd.DataFrame` of the pipe's data (default).
115    A `List[Dict]` if `as_docs=True`.
116    An `Iterator[pd.DataFrame]` if `as_chunks=True` (or `as_iterator=True`).
117    An `Iterator[List[Dict]]` if both `as_docs=True` and `as_chunks=True`.
118
119    """
120    from meerschaum.utils.warnings import warn
121    from meerschaum.utils.venv import Venv
122    from meerschaum.connectors import get_connector_plugin
123    from meerschaum.utils.dtypes import to_pandas_dtype
124    from meerschaum.utils.dataframe import add_missing_cols_to_df, df_is_chunk_generator
125    from meerschaum.utils.packages import attempt_import
126    from meerschaum.utils.warnings import dprint
127    dd = attempt_import('dask.dataframe') if as_dask else None
128    dask = attempt_import('dask') if as_dask else None
129    _ = attempt_import('partd', lazy=False) if as_dask else None
130
131    if select_columns == '*':
132        select_columns = None
133    elif isinstance(select_columns, str):
134        select_columns = [select_columns]
135
136    if isinstance(omit_columns, str):
137        omit_columns = [omit_columns]
138
139    begin, end = self.parse_date_bounds(begin, end, debug=debug)
140    as_iterator = as_iterator or as_chunks
141    dt_col = self.columns.get('datetime', None)
142
143    def _sort_df(_df):
144        if df_is_chunk_generator(_df):
145            return _df
146        indices = [] if dt_col not in _df.columns else [dt_col]
147        non_dt_cols = [
148            col
149            for col_ix, col in self.columns.items()
150            if col_ix != 'datetime' and col in _df.columns
151        ]
152        indices.extend(non_dt_cols)
153        if 'dask' not in _df.__module__:
154            _df.sort_values(
155                by=indices,
156                inplace=True,
157                ascending=(str(order).lower() == 'asc'),
158            )
159            _df.reset_index(drop=True, inplace=True)
160        else:
161            _df = _df.sort_values(
162                by=indices,
163                ascending=(str(order).lower() == 'asc'),
164            )
165            _df = _df.reset_index(drop=True)
166        if limit is not None and len(_df) > limit:
167            return _df.head(limit)
168        return _df
169
170    if as_iterator or as_chunks:
171        df = self._get_data_as_iterator(
172            select_columns=select_columns,
173            omit_columns=omit_columns,
174            begin=begin,
175            end=end,
176            params=params,
177            chunk_interval=chunk_interval,
178            limit=limit,
179            order=order,
180            as_docs=as_docs,
181            fresh=fresh,
182            debug=debug,
183        )
184        if as_docs:
185            return df
186        return _sort_df(df)
187
188    if as_dask:
189        from multiprocessing.pool import ThreadPool
190        dask_pool = ThreadPool(self.get_num_workers())
191        dask.config.set(pool=dask_pool)
192        chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
193        bounds = self.get_chunk_bounds(
194            begin=begin,
195            end=end,
196            bounded=False,
197            chunk_interval=chunk_interval,
198            debug=debug,
199        )
200        dask_chunks = [
201            dask.delayed(self.get_data)(
202                select_columns=select_columns,
203                omit_columns=omit_columns,
204                begin=chunk_begin,
205                end=chunk_end,
206                params=params,
207                chunk_interval=chunk_interval,
208                order=order,
209                limit=limit,
210                fresh=fresh,
211                add_missing_columns=True,
212                debug=debug,
213            )
214            for (chunk_begin, chunk_end) in bounds
215        ]
216        dask_meta = {
217            col: to_pandas_dtype(typ)
218            for col, typ in self.get_dtypes(refresh=True, infer=True, debug=debug).items()
219        }
220        if debug:
221            dprint(f"Dask meta:\n{dask_meta}")
222        return _sort_df(dd.from_delayed(dask_chunks, meta=dask_meta))
223
224    if not self.exists(debug=debug):
225        return [] if as_docs else None
226
227    if as_docs:
228        with Venv(get_connector_plugin(self.instance_connector)):
229            docs = self.instance_connector.get_pipe_docs(
230                pipe=self,
231                select_columns=select_columns,
232                omit_columns=omit_columns,
233                begin=begin,
234                end=end,
235                params=params,
236                limit=limit,
237                order=order,
238                debug=debug,
239                **kw
240            )
241        return docs if docs is not None else []
242
243    with Venv(get_connector_plugin(self.instance_connector)):
244        df = self.instance_connector.get_pipe_data(
245            pipe=self,
246            select_columns=select_columns,
247            omit_columns=omit_columns,
248            begin=begin,
249            end=end,
250            params=params,
251            limit=limit,
252            order=order,
253            debug=debug,
254            **kw
255        )
256        if df is None:
257            return df
258
259        if not select_columns:
260            select_columns = [col for col in df.columns]
261
262        pipe_dtypes = self.get_dtypes(refresh=False, debug=debug)
263        cols_to_omit = [
264            col
265            for col in df.columns
266            if (
267                col in (omit_columns or [])
268                or
269                col not in (select_columns or [])
270            )
271        ]
272        cols_to_add = [
273            col
274            for col in select_columns
275            if col not in df.columns
276        ] + ([
277            col
278            for col in pipe_dtypes
279            if col not in df.columns
280        ] if add_missing_columns else [])
281        if cols_to_omit:
282            warn(
283                (
284                    f"Received {len(cols_to_omit)} omitted column"
285                    + ('s' if len(cols_to_omit) != 1 else '')
286                    + f" for {self}. "
287                    + "Consider adding `select_columns` and `omit_columns` support to "
288                    + f"'{self.instance_connector.type}' connectors to improve performance."
289                ),
290                stack=False,
291            )
292            _cols_to_select = [col for col in df.columns if col not in cols_to_omit]
293            df = df[_cols_to_select]
294
295        if cols_to_add:
296            if not add_missing_columns:
297                from meerschaum.utils.misc import items_str
298                warn(
299                    f"Will add columns {items_str(cols_to_add)} as nulls to dataframe.",
300                    stack=False,
301                )
302
303            df = add_missing_cols_to_df(
304                df,
305                {
306                    col: pipe_dtypes.get(col, 'string')
307                    for col in cols_to_add
308                },
309            )
310
311        enforced_df = self.enforce_dtypes(
312            df,
313            dtypes=pipe_dtypes,
314            debug=debug,
315        )
316
317        if order:
318            return _sort_df(enforced_df)
319        return enforced_df

Get a pipe's data from the instance connector.

Parameters
  • select_columns (Optional[List[str]], default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (Optional[List[str]], default None): If provided, remove these columns from the selection.
  • begin (Union[datetime, int, str, None], default None): Lower bound datetime to begin searching for data (inclusive). Translates to a WHERE clause like WHERE datetime >= begin. Defaults to None.
  • end (Union[datetime, int, str, None], default None): Upper bound datetime to stop searching for data (inclusive). Translates to a WHERE clause like WHERE datetime < end. Defaults to None.
  • params (Optional[Dict[str, Any]], default None): Filter the retrieved data by a dictionary of parameters. See meerschaum.utils.sql.build_where for more details.
  • as_docs (bool, default False): If True, return a list of dictionaries rather than a DataFrame. Relies on get_pipe_docs from the instance connector if implemented. May be combined with as_chunks to return an Iterator[List[Dict]] chunked by time bounds (useful for large result sets without pandas overhead).
  • as_iterator (bool, default False): If True, return a generator of chunks of pipe data. When combined with as_docs=True, yields List[Dict] per chunk instead of DataFrames.
  • as_chunks (bool, default False): Alias for as_iterator. When combined with as_docs=True, yields List[Dict] per chunk instead of DataFrames.
  • as_dask (bool, default False): If True, return a dask.DataFrame (which may be loaded into a Pandas DataFrame with df.compute()).
  • add_missing_columns (bool, default False): If True, add any missing columns from Pipe.dtypes to the dataframe.
  • chunk_interval (Union[timedelta, int, None], default None): If as_iterator, then return chunks with begin and end separated by this interval. This may be set under pipe.parameters['chunk_minutes']. By default, use a timedelta of 43200 minutes (30 days). If chunk_interval is an integer and the datetime axis a timestamp, the use a timedelta with the number of minutes configured to this value. If the datetime axis is an integer, default to the configured chunksize. If chunk_interval is a timedelta and the datetime axis an integer, use the number of minutes in the timedelta.
  • order (Optional[str], default 'asc'): If order is not None, sort the resulting dataframe by indices.
  • limit (Optional[int], default None): If provided, cap the dataframe to this many rows.
  • fresh (bool, default False): If True, skip local cache and directly query the instance connector.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
  • A pd.DataFrame of the pipe's data (default).
  • A List[Dict] if as_docs=True.
  • An Iterator[pd.DataFrame] if as_chunks=True (or as_iterator=True).
  • An Iterator[List[Dict]] if both as_docs=True and as_chunks=True.
def get_backtrack_data( self, backtrack_minutes: Optional[int] = None, begin: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, limit: Optional[int] = None, fresh: bool = False, debug: bool = False, **kw: Any) -> Optional[pandas.DataFrame]:
414def get_backtrack_data(
415    self,
416    backtrack_minutes: Optional[int] = None,
417    begin: Union[datetime, int, None] = None,
418    params: Optional[Dict[str, Any]] = None,
419    limit: Optional[int] = None,
420    fresh: bool = False,
421    debug: bool = False,
422    **kw: Any
423) -> Optional['pd.DataFrame']:
424    """
425    Get the most recent data from the instance connector as a Pandas DataFrame.
426
427    Parameters
428    ----------
429    backtrack_minutes: Optional[int], default None
430        How many minutes from `begin` to select from.
431        If `None`, use `pipe.parameters['fetch']['backtrack_minutes']`.
432
433    begin: Optional[datetime], default None
434        The starting point to search for data.
435        If begin is `None` (default), use the most recent observed datetime
436        (AKA sync_time).
437
438        ```
439        E.g. begin = 02:00
440
441        Search this region.           Ignore this, even if there's data.
442        /  /  /  /  /  /  /  /  /  |
443        -----|----------|----------|----------|----------|----------|
444        00:00      01:00      02:00      03:00      04:00      05:00
445
446        ```
447
448    params: Optional[Dict[str, Any]], default None
449        The standard Meerschaum `params` query dictionary.
450
451    limit: Optional[int], default None
452        If provided, cap the number of rows to be returned.
453
454    fresh: bool, default False
455        If `True`, Ignore local cache and pull directly from the instance connector.
456        Only comes into effect if a pipe was created with `cache=True`.
457
458    debug: bool default False
459        Verbosity toggle.
460
461    Returns
462    -------
463    A `pd.DataFrame` for the pipe's data corresponding to the provided parameters. Backtrack data
464    is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
465    """
466    from meerschaum.utils.venv import Venv
467    from meerschaum.connectors import get_connector_plugin
468
469    if not self.exists(debug=debug):
470        return None
471
472    begin = self.parse_date_bounds(begin, debug=debug)
473
474    backtrack_interval = self.get_backtrack_interval(debug=debug)
475    if backtrack_minutes is None:
476        backtrack_minutes = (
477            (backtrack_interval.total_seconds() / 60)
478            if isinstance(backtrack_interval, timedelta)
479            else backtrack_interval
480        )
481
482    if hasattr(self.instance_connector, 'get_backtrack_data'):
483        with Venv(get_connector_plugin(self.instance_connector)):
484            return self.enforce_dtypes(
485                self.instance_connector.get_backtrack_data(
486                    pipe=self,
487                    begin=begin,
488                    backtrack_minutes=backtrack_minutes,
489                    params=params,
490                    limit=limit,
491                    debug=debug,
492                    **kw
493                ),
494                debug=debug,
495            )
496
497    if begin is None:
498        begin = self.get_sync_time(params=params, debug=debug)
499
500    backtrack_interval = (
501        timedelta(minutes=backtrack_minutes)
502        if isinstance(begin, datetime)
503        else backtrack_minutes
504    )
505    if begin is not None:
506        begin = begin - backtrack_interval
507
508    kw['order'] = kw.get('order', 'desc') or 'desc'
509    return self.get_data(
510        begin=begin,
511        params=params,
512        debug=debug,
513        limit=limit,
514        **kw
515    )

Get the most recent data from the instance connector as a Pandas DataFrame.

Parameters
  • backtrack_minutes (Optional[int], default None): How many minutes from begin to select from. If None, use pipe.parameters['fetch']['backtrack_minutes'].
  • begin (Optional[datetime], default None): The starting point to search for data. If begin is None (default), use the most recent observed datetime (AKA sync_time).

    E.g. begin = 02:00
    
    Search this region.           Ignore this, even if there's data.
    /  /  /  /  /  /  /  /  /  |
    -----|----------|----------|----------|----------|----------|
    00:00      01:00      02:00      03:00      04:00      05:00
    
    
  • params (Optional[Dict[str, Any]], default None): The standard Meerschaum params query dictionary.

  • limit (Optional[int], default None): If provided, cap the number of rows to be returned.
  • fresh (bool, default False): If True, Ignore local cache and pull directly from the instance connector. Only comes into effect if a pipe was created with cache=True.
  • debug (bool default False): Verbosity toggle.
Returns
  • A pd.DataFrame for the pipe's data corresponding to the provided parameters. Backtrack data
  • is a convenient way to get a pipe's data "backtracked" from the most recent datetime.
def get_rowcount( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, remote: bool = False, debug: bool = False) -> int:
518def get_rowcount(
519    self,
520    begin: Union[datetime, int, None] = None,
521    end: Union[datetime, int, None] = None,
522    params: Optional[Dict[str, Any]] = None,
523    remote: bool = False,
524    debug: bool = False
525) -> int:
526    """
527    Get a Pipe's instance or remote rowcount.
528
529    Parameters
530    ----------
531    begin: Optional[datetime], default None
532        Count rows where datetime > begin.
533
534    end: Optional[datetime], default None
535        Count rows where datetime < end.
536
537    remote: bool, default False
538        Count rows from a pipe's remote source.
539        **NOTE**: This is experimental!
540
541    debug: bool, default False
542        Verbosity toggle.
543
544    Returns
545    -------
546    An `int` of the number of rows in the pipe corresponding to the provided parameters.
547    Returned 0 if the pipe does not exist.
548    """
549    from meerschaum.utils.warnings import warn
550    from meerschaum.utils.venv import Venv
551    from meerschaum.connectors import get_connector_plugin
552    from meerschaum.utils.misc import filter_keywords
553
554    begin, end = self.parse_date_bounds(begin, end, debug=debug)
555    connector = self.instance_connector if not remote else self.connector
556    try:
557        with Venv(get_connector_plugin(connector)):
558            if not hasattr(connector, 'get_pipe_rowcount'):
559                warn(
560                    f"Connectors of type '{connector.type}' "
561                    "do not implement `get_pipe_rowcount()`.",
562                    stack=False,
563                )
564                return 0
565            kwargs = filter_keywords(
566                connector.get_pipe_rowcount,
567                begin=begin,
568                end=end,
569                params=params,
570                remote=remote,
571                debug=debug,
572            )
573            if remote and 'remote' not in kwargs:
574                warn(
575                    f"Connectors of type '{connector.type}' do not support remote rowcounts.",
576                    stack=False,
577                )
578                return 0
579            rowcount = connector.get_pipe_rowcount(
580                self,
581                begin=begin,
582                end=end,
583                params=params,
584                remote=remote,
585                debug=debug,
586            )
587            if rowcount is None:
588                return 0
589            return rowcount
590    except AttributeError as e:
591        warn(e)
592        if remote:
593            return 0
594    warn(f"Failed to get a rowcount for {self}.")
595    return 0

Get a Pipe's instance or remote rowcount.

Parameters
  • begin (Optional[datetime], default None): Count rows where datetime > begin.
  • end (Optional[datetime], default None): Count rows where datetime < end.
  • remote (bool, default False): Count rows from a pipe's remote source. NOTE: This is experimental!
  • debug (bool, default False): Verbosity toggle.
Returns
  • An int of the number of rows in the pipe corresponding to the provided parameters.
  • Returned 0 if the pipe does not exist.
def get_size(self, debug: bool = False, **kw: Any) -> Optional[int]:
598def get_size(
599    self,
600    debug: bool = False,
601    **kw: Any
602) -> Union[int, None]:
603    """
604    Return the on-disk size of the pipe's target table in bytes.
605
606    Parameters
607    ----------
608    debug: bool, default False
609        Verbosity toggle.
610
611    Returns
612    -------
613    An `int` of the number of bytes occupied by the pipe's target table,
614    or `None` if the size could not be determined (e.g. the connector does
615    not implement `get_pipe_size()` or the table does not exist).
616    """
617    from meerschaum.utils.warnings import warn
618    from meerschaum.utils.venv import Venv
619    from meerschaum.connectors import get_connector_plugin
620    from meerschaum.utils.misc import filter_keywords
621
622    connector = self.instance_connector
623    try:
624        with Venv(get_connector_plugin(connector)):
625            if not hasattr(connector, 'get_pipe_size'):
626                return None
627            kwargs = filter_keywords(
628                connector.get_pipe_size,
629                debug=debug,
630                **kw
631            )
632            return connector.get_pipe_size(self, **kwargs)
633    except NotImplementedError:
634        return None
635    except Exception as e:
636        warn(f"Failed to get the size of {self}:\n{e}", stack=False)
637    return None

Return the on-disk size of the pipe's target table in bytes.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • An int of the number of bytes occupied by the pipe's target table,
  • or None if the size could not be determined (e.g. the connector does
  • not implement get_pipe_size() or the table does not exist).
def get_doc(self, **kwargs) -> Optional[Dict[str, Any]]:
1004def get_doc(self, **kwargs) -> Union[Dict[str, Any], None]:
1005    """
1006    Convenience function to return a single row as a dictionary (or `None`) from `Pipe.get_data().
1007    Keywords arguments are passed to `Pipe.get_data()`.
1008    """
1009    from meerschaum.utils.warnings import warn
1010    kwargs['limit'] = 1
1011    kwargs['as_docs'] = True
1012    try:
1013        docs = self.get_data(**kwargs)
1014        if not docs:
1015            return None
1016        return docs[0]
1017    except Exception as e:
1018        warn(f"Failed to read value from {self}:\n{e}", stack=False)
1019        return None

Convenience function to return a single row as a dictionary (or None) from Pipe.get_data(). Keywords arguments are passed toPipe.get_data()`.

def get_docs(self, **kwargs) -> list[dict[str, typing.Any]]:
1021def get_docs(self, **kwargs) -> list[dict[str, Any]]:
1022    """
1023    Convenience method to return a pipe's data as a list of dictionaries.
1024    Relies on `get_pipe_docs` from the instance connector if implemented.
1025    """
1026    kwargs['as_docs'] = True
1027    return self.get_data(**kwargs)

Convenience method to return a pipe's data as a list of dictionaries. Relies on get_pipe_docs from the instance connector if implemented.

def get_value( self, column: str, params: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any:
1029def get_value(
1030    self,
1031    column: str,
1032    params: Optional[Dict[str, Any]] = None,
1033    **kwargs: Any
1034) -> Any:
1035    """
1036    Convenience function to return a single value (or `None`) from `Pipe.get_data()`.
1037    Keywords arguments are passed to `Pipe.get_data()`.
1038    """
1039    from meerschaum.utils.warnings import warn
1040    kwargs['select_columns'] = [column]
1041    kwargs['limit'] = 1
1042    kwargs['as_docs'] = True
1043    try:
1044        docs = self.get_data(params=params, **kwargs)
1045        if not docs:
1046            return None
1047        if column not in docs[0]:
1048            raise ValueError(f"Column '{column}' was not included in the result set.")
1049        return docs[0][column]
1050    except Exception as e:
1051        warn(f"Failed to read value from {self}:\n{e}", stack=False)
1052        return None

Convenience function to return a single value (or None) from Pipe.get_data(). Keywords arguments are passed to Pipe.get_data().

def get_chunk_interval( self, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False) -> Union[datetime.timedelta, int]:
640def get_chunk_interval(
641    self,
642    chunk_interval: Union[timedelta, int, None] = None,
643    debug: bool = False,
644) -> Union[timedelta, int]:
645    """
646    Get the chunk interval to use for this pipe.
647
648    The size is read from the `verify` parameters. Any one of these aliased keys may be used
649    (the first present, in this priority order, wins):
650
651        - `verify.chunk_minutes` (the default; 43200 — 30 days — if none is set)
652        - `verify.chunk_hours`
653        - `verify.chunk_days`
654        - `verify.chunk_weeks`
655        - `verify.chunk_years`
656        - `verify.chunk_seconds`
657
658    For an integer datetime axis, `verify.chunk_range` (if set) is used verbatim as the chunk size
659    in epoch units. Otherwise the time-based size above is converted to epoch units via the pipe's
660    `precision`, or — preserving legacy behavior when no `precision` is set — its minutes are used
661    verbatim.
662
663    Parameters
664    ----------
665    chunk_interval: Union[timedelta, int, None], default None
666        If provided, coerce this value into the correct type (overriding the `verify` keys).
667        For example, if the datetime axis is an integer, then return the number of minutes.
668
669    Returns
670    -------
671    The chunk interval (`timedelta` or `int`) to use with this pipe's `datetime` axis.
672    """
673    from meerschaum.utils.dtypes import MRSM_PRECISION_UNITS_SCALARS, MRSM_PRECISION_UNITS_ALIASES
674
675    dt_col = self.columns.get('datetime', None)
676    dt_dtype = self.dtypes.get(dt_col, 'datetime') if dt_col is not None else 'datetime'
677    is_int_axis = 'int' in str(dt_dtype).lower()
678    verify_params = self.parameters.get('verify', {})
679
680    ### An explicit `chunk_interval` argument overrides everything (legacy behavior).
681    if chunk_interval is not None:
682        chunk_minutes = (
683            chunk_interval
684            if isinstance(chunk_interval, int)
685            else int(chunk_interval.total_seconds() / 60)
686        )
687        if dt_col is None:
688            return timedelta(minutes=chunk_minutes)
689        return chunk_minutes if is_int_axis else timedelta(minutes=chunk_minutes)
690
691    ### Integer axis: an explicit `verify.chunk_range` is the chunk size in epoch units, verbatim.
692    if dt_col is not None and is_int_axis:
693        chunk_range = verify_params.get('chunk_range', None)
694        if chunk_range is not None:
695            return int(chunk_range)
696
697    ### Resolve the time-based chunk size from the aliased `verify.chunk_*` keys (priority order
698    ### matches the `bound_*` aliases). Falls back to the configured `chunk_minutes` default.
699    chunk_delta = None
700    for suffix in ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds'):
701        val = verify_params.get('chunk_' + suffix, None)
702        if val is None:
703            continue
704        ### `timedelta` has no `years` kwarg; approximate a year as 365 days.
705        chunk_delta = timedelta(days=(val * 365)) if suffix == 'years' else timedelta(**{suffix: val})
706        break
707    if chunk_delta is None:
708        default_chunk_minutes = get_config('pipes', 'parameters', 'verify', 'chunk_minutes')
709        chunk_delta = timedelta(minutes=default_chunk_minutes)
710
711    if dt_col is None:
712        return chunk_delta
713
714    if is_int_axis:
715        ### Legacy: without `precision` (and without `chunk_range`), use the chunk's minutes
716        ### verbatim as the integer interval.
717        if not self.parameters.get('precision', None):
718            return int(chunk_delta.total_seconds() / 60)
719        precision_unit = self.precision.get('unit', None)
720        true_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
721        scalar = MRSM_PRECISION_UNITS_SCALARS.get(true_unit, None)
722        if scalar is not None:
723            return int(chunk_delta.total_seconds() * scalar)
724        return int(chunk_delta.total_seconds() / 60)
725
726    return chunk_delta

Get the chunk interval to use for this pipe.

The size is read from the verify parameters. Any one of these aliased keys may be used (the first present, in this priority order, wins):

- `verify.chunk_minutes` (the default; 43200 — 30 days — if none is set)
- `verify.chunk_hours`
- `verify.chunk_days`
- `verify.chunk_weeks`
- `verify.chunk_years`
- `verify.chunk_seconds`

For an integer datetime axis, verify.chunk_range (if set) is used verbatim as the chunk size in epoch units. Otherwise the time-based size above is converted to epoch units via the pipe's precision, or — preserving legacy behavior when no precision is set — its minutes are used verbatim.

Parameters
  • chunk_interval (Union[timedelta, int, None], default None): If provided, coerce this value into the correct type (overriding the verify keys). For example, if the datetime axis is an integer, then return the number of minutes.
Returns
  • The chunk interval (timedelta or int) to use with this pipe's datetime axis.
def get_chunk_bounds( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, bounded: bool = False, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, align: bool = False, debug: bool = False) -> List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]]:
729def get_chunk_bounds(
730    self,
731    begin: Union[datetime, int, None] = None,
732    end: Union[datetime, int, None] = None,
733    bounded: bool = False,
734    chunk_interval: Union[timedelta, int, None] = None,
735    align: bool = False,
736    debug: bool = False,
737) -> List[
738    Tuple[
739        Union[datetime, int, None],
740        Union[datetime, int, None],
741    ]
742]:
743    """
744    Return a list of datetime bounds for iterating over the pipe's `datetime` axis.
745
746    Parameters
747    ----------
748    begin: Union[datetime, int, None], default None
749        If provided, do not select less than this value.
750        Otherwise the first chunk will be unbounded.
751
752    end: Union[datetime, int, None], default None
753        If provided, do not select greater than or equal to this value.
754        Otherwise the last chunk will be unbounded.
755
756    bounded: bool, default False
757        If `True`, do not include `None` in the first chunk.
758
759    chunk_interval: Union[timedelta, int, None], default None
760        If provided, use this interval for the size of chunk boundaries.
761        The default value for this pipe may be set
762        under `pipe.parameters['verify']['chunk_minutes']`.
763
764    align: bool, default False
765        If `True`, anchor the interior chunk boundaries to a fixed Unix-epoch grid (the same
766        grid used for native range partitioning) rather than to `begin`. This makes the
767        boundaries deterministic across re-syncs and aligned with the pipe's partitions
768        (used by `Pipe.verify()`). The first chunk's lower bound and the last chunk's upper
769        bound are still clamped to `begin` / `end`.
770
771    debug: bool, default False
772        Verbosity toggle.
773
774    Returns
775    -------
776    A list of chunk bounds (datetimes or integers).
777    If unbounded, the first and last chunks will include `None`.
778    """
779    from datetime import timedelta
780    from meerschaum.utils.dtypes import are_dtypes_equal
781    from meerschaum.utils.misc import interval_str
782    include_less_than_begin = not bounded and begin is None
783    include_greater_than_end = not bounded and end is None
784    if begin is None:
785        begin = self.get_sync_time(newest=False, debug=debug)
786    consolidate_end_chunk = False
787    if end is None:
788        end = self.get_sync_time(newest=True, debug=debug)
789        if end is not None and hasattr(end, 'tzinfo'):
790            end += timedelta(minutes=1)
791            consolidate_end_chunk = True
792        elif are_dtypes_equal(str(type(end)), 'int'):
793            end += 1
794            consolidate_end_chunk = True
795
796    if begin is None and end is None:
797        return [(None, None)]
798
799    begin, end = self.parse_date_bounds(begin, end, debug=debug)
800
801    if begin and end:
802        if begin >= end:
803            return (
804                [(begin, begin)]
805                if bounded
806                else [(begin, None)]
807            )
808        if end <= begin:
809            return (
810                [(end, end)]
811                if bounded
812                else [(None, begin)]
813            )
814
815    ### Set the chunk interval under `pipe.parameters['verify']['chunk_minutes']`.
816    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
817
818    ### Anchor the interior boundaries to a fixed Unix-epoch grid (matching native range
819    ### partitioning, see `SQLConnector._partition_bounds`) so chunk edges line up with partition
820    ### edges and stay deterministic regardless of `begin`. The first chunk is clamped back to
821    ### `begin` below.
822    begin_cursor = begin
823    if align and begin is not None:
824        if isinstance(chunk_interval, int):
825            begin_cursor = (int(begin) // chunk_interval) * chunk_interval
826        else:
827            epoch = (
828                datetime(1970, 1, 1, tzinfo=begin.tzinfo)
829                if getattr(begin, 'tzinfo', None) is not None
830                else datetime(1970, 1, 1)
831            )
832            n = (begin - epoch) // chunk_interval
833            begin_cursor = epoch + (n * chunk_interval)
834
835    ### Build a list of tuples containing the chunk boundaries
836    ### so that we can sync multiple chunks in parallel.
837    ### Run `verify pipes --workers 1` to sync chunks in series.
838    chunk_bounds = []
839    num_chunks = 0
840    max_chunks = 1_000_000
841    while begin_cursor < end:
842        end_cursor = begin_cursor + chunk_interval
843        chunk_bounds.append((begin_cursor, end_cursor))
844        begin_cursor = end_cursor
845        num_chunks += 1
846        if num_chunks >= max_chunks:
847            raise ValueError(
848                f"Too many chunks of size '{interval_str(chunk_interval)}' "
849                f"between '{begin}' and '{end}'."
850            )
851
852    if num_chunks > 1 and consolidate_end_chunk:
853        last_bounds, second_last_bounds = chunk_bounds[-1], chunk_bounds[-2]
854        chunk_bounds = chunk_bounds[:-2]
855        chunk_bounds.append((second_last_bounds[0], last_bounds[1]))
856
857    ### The chunk interval might be too large.
858    if not chunk_bounds and end >= begin:
859        chunk_bounds = [(begin, end)]
860
861    ### Truncate the last chunk to the end timestamp.
862    if chunk_bounds[-1][1] > end:
863        chunk_bounds[-1] = (chunk_bounds[-1][0], end)
864
865    ### Pop the last chunk if its bounds are equal.
866    if chunk_bounds[-1][0] == chunk_bounds[-1][1]:
867        chunk_bounds = chunk_bounds[:-1]
868
869    ### Clamp the epoch-aligned first chunk's lower bound back to the requested `begin` so the
870    ### returned range still starts exactly at `begin` (only the interior edges are grid-aligned).
871    if (
872        align
873        and chunk_bounds
874        and chunk_bounds[0][0] is not None
875        and chunk_bounds[0][0] < begin
876    ):
877        chunk_bounds[0] = (begin, chunk_bounds[0][1])
878
879    if include_less_than_begin:
880        chunk_bounds = [(None, begin)] + chunk_bounds
881    if include_greater_than_end:
882        chunk_bounds = chunk_bounds + [(end, None)]
883
884    return chunk_bounds

Return a list of datetime bounds for iterating over the pipe's datetime axis.

Parameters
  • begin (Union[datetime, int, None], default None): If provided, do not select less than this value. Otherwise the first chunk will be unbounded.
  • end (Union[datetime, int, None], default None): If provided, do not select greater than or equal to this value. Otherwise the last chunk will be unbounded.
  • bounded (bool, default False): If True, do not include None in the first chunk.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this interval for the size of chunk boundaries. The default value for this pipe may be set under pipe.parameters['verify']['chunk_minutes'].
  • align (bool, default False): If True, anchor the interior chunk boundaries to a fixed Unix-epoch grid (the same grid used for native range partitioning) rather than to begin. This makes the boundaries deterministic across re-syncs and aligned with the pipe's partitions (used by Pipe.verify()). The first chunk's lower bound and the last chunk's upper bound are still clamped to begin / end.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of chunk bounds (datetimes or integers).
  • If unbounded, the first and last chunks will include None.
def get_chunk_bounds_batches( self, chunk_bounds: List[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]]], batchsize: Optional[int] = None, workers: Optional[int] = None, debug: bool = False) -> List[Tuple[Tuple[Union[datetime.datetime, int, NoneType], Union[datetime.datetime, int, NoneType]], ...]]:
887def get_chunk_bounds_batches(
888    self,
889    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]],
890    batchsize: Optional[int] = None,
891    workers: Optional[int] = None,
892    debug: bool = False,
893) -> List[
894    Tuple[
895        Tuple[
896            Union[datetime, int, None],
897            Union[datetime, int, None],
898        ], ...
899    ]
900]:
901    """
902    Return a list of tuples of chunk bounds of size `batchsize`.
903
904    Parameters
905    ----------
906    chunk_bounds: List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]
907        A list of chunk_bounds (see `Pipe.get_chunk_bounds()`).
908
909    batchsize: Optional[int], default None
910        How many chunks to include in a batch. Defaults to `Pipe.get_num_workers()`.
911
912    workers: Optional[int], default None
913        If `batchsize` is `None`, use this as the desired number of workers.
914        Passed to `Pipe.get_num_workers()`.
915
916    Returns
917    -------
918    A list of tuples of chunk bound tuples.
919    """
920    from meerschaum.utils.misc import iterate_chunks
921    
922    if batchsize is None:
923        batchsize = self.get_num_workers(workers=workers)
924
925    return [
926        tuple(
927            _batch_chunk_bounds
928            for _batch_chunk_bounds in batch
929            if _batch_chunk_bounds is not None
930        )
931        for batch in iterate_chunks(chunk_bounds, batchsize)
932        if batch
933    ]

Return a list of tuples of chunk bounds of size batchsize.

Parameters
  • chunk_bounds (List[Tuple[Union[datetime, int, None], Union[datetime, int, None]]]): A list of chunk_bounds (see Pipe.get_chunk_bounds()).
  • batchsize (Optional[int], default None): How many chunks to include in a batch. Defaults to Pipe.get_num_workers().
  • workers (Optional[int], default None): If batchsize is None, use this as the desired number of workers. Passed to Pipe.get_num_workers().
Returns
  • A list of tuples of chunk bound tuples.
def parse_date_bounds( self, *dt_vals: Union[datetime.datetime, int, NoneType], debug: bool = False) -> Union[datetime.datetime, int, str, NoneType, Tuple[Union[datetime.datetime, int, str, NoneType]]]:
 936def parse_date_bounds(self, *dt_vals: Union[datetime, int, None], debug: bool = False) -> Union[
 937    datetime,
 938    int,
 939    str,
 940    None,
 941    Tuple[Union[datetime, int, str, None]]
 942]:
 943    """
 944    Given a date bound (begin, end), coerce a timezone if necessary.
 945    """
 946    from meerschaum.utils.misc import is_int
 947    from meerschaum.utils.dtypes import coerce_timezone, MRSM_PD_DTYPES, are_dtypes_equal
 948    from meerschaum.utils.warnings import warn
 949    dateutil_parser = mrsm.attempt_import('dateutil.parser')
 950
 951    _columns = None
 952    _dtypes = None
 953
 954    def _get_coercion_info():
 955        nonlocal _columns, _dtypes
 956        if _columns is None:
 957            _columns = self.get_parameters(debug=debug).get('columns', {}) or {}
 958        if _dtypes is None:
 959            _dtypes = self.get_dtypes(debug=debug)
 960
 961    def _parse_date_bound(dt_val):
 962        if dt_val is None:
 963            return None
 964
 965        if isinstance(dt_val, int):
 966            return dt_val
 967
 968        if dt_val == '':
 969            return ''
 970
 971        if is_int(dt_val):
 972            return int(dt_val)
 973
 974        if isinstance(dt_val, str):
 975            try:
 976                dt_val = dateutil_parser.parse(dt_val)
 977            except Exception as e:
 978                warn(f"Could not parse '{dt_val}' as datetime:\n{e}")
 979                return None
 980
 981        _get_coercion_info()
 982        dt_col = _columns.get('datetime', None)
 983        dt_typ = str(_dtypes.get(dt_col, 'datetime'))
 984        if are_dtypes_equal(dt_typ, 'int'):
 985            if self.get_parameters(debug=debug).get('precision'):
 986                from meerschaum.utils.dtypes import datetime_to_int
 987                return datetime_to_int(dt_val, self.precision['unit'])
 988            from meerschaum.utils.warnings import error
 989            error(
 990                f"Cannot use datetime bound '{dt_val}' on the non-epoch integer axis "
 991                f"of {self}.\n    Pass an integer instead, or set the `precision` parameter.",
 992                ValueError,
 993            )
 994        if dt_typ == 'datetime':
 995            dt_typ = MRSM_PD_DTYPES['datetime']
 996        return coerce_timezone(dt_val, strip_utc=('utc' not in dt_typ.lower()))
 997
 998    bounds = tuple(_parse_date_bound(dt_val) for dt_val in dt_vals)
 999    if len(bounds) == 1:
1000        return bounds[0]
1001    return bounds

Given a date bound (begin, end), coerce a timezone if necessary.

def register(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
12def register(
13    self,
14    debug: bool = False,
15    **kw: Any
16) -> SuccessTuple:
17    """
18    Register a new Pipe along with its attributes.
19
20    Parameters
21    ----------
22    debug: bool, default False
23        Verbosity toggle.
24
25    kw: Any
26        Keyword arguments to pass to `instance_connector.register_pipe()`.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31    """
32    if self.temporary:
33        return False, "Cannot register pipes created with `temporary=True` (read-only)."
34
35    from meerschaum.utils.formatting import get_console
36    from meerschaum.utils.venv import Venv
37    from meerschaum.connectors import get_connector_plugin, custom_types
38    from meerschaum.config._patch import apply_patch_to_config
39
40    import warnings
41    with warnings.catch_warnings():
42        warnings.simplefilter('ignore')
43        try:
44            _conn = self.connector
45        except Exception:
46            _conn = None
47
48        if isinstance(_conn, str):
49            _conn = None
50
51    if (
52        _conn is not None
53        and
54        (_conn.type == 'plugin' or _conn.type in custom_types)
55        and
56        getattr(_conn, 'register', None) is not None
57    ):
58        try:
59            with Venv(get_connector_plugin(_conn), debug=debug):
60                params = self.connector.register(self)
61        except Exception:
62            get_console().print_exception()
63            params = None
64        params = {} if params is None else params
65        if not isinstance(params, dict):
66            from meerschaum.utils.warnings import warn
67            warn(
68                f"Invalid parameters returned from `register()` in connector {self.connector}:\n"
69                + f"{params}"
70            )
71        else:
72            self.parameters = apply_patch_to_config(params, self.parameters)
73
74    if not self.parameters:
75        cols = self.columns if self.columns else {'datetime': None, 'id': None}
76        self.parameters = {
77            'columns': cols,
78        }
79
80    with Venv(get_connector_plugin(self.instance_connector)):
81        return self.instance_connector.register_pipe(self, debug=debug, **kw)

Register a new Pipe along with its attributes.

Parameters
  • debug (bool, default False): Verbosity toggle.
  • kw (Any): Keyword arguments to pass to instance_connector.register_pipe().
Returns
attributes: Dict[str, Any]
20@property
21def attributes(self) -> Dict[str, Any]:
22    """
23    Return a dictionary of a pipe's keys and parameters.
24    These values are reflected directly from the pipes table of the instance.
25    """
26    from meerschaum.config import get_config
27    from meerschaum.config._patch import apply_patch_to_config
28    from meerschaum.utils.venv import Venv
29    from meerschaum.connectors import get_connector_plugin
30    from meerschaum.utils.dtypes import get_current_timestamp
31
32    timeout_seconds = get_config('pipes', 'attributes', 'local_cache_timeout_seconds')
33
34    now = get_current_timestamp('ms', as_int=True) / 1000
35    _attributes_sync_time = self._get_cached_value('_attributes_sync_time', debug=self.debug)
36    timed_out = (
37        _attributes_sync_time is None
38        or
39        (timeout_seconds is not None and (now - _attributes_sync_time) >= timeout_seconds)
40    )
41    if not self.temporary and timed_out:
42        self._cache_value('_attributes_sync_time', now, memory_only=True, debug=self.debug)
43        local_attributes = self._get_cached_value('attributes', debug=self.debug) or {}
44        with Venv(get_connector_plugin(self.instance_connector)):
45            instance_attributes = self.instance_connector.get_pipe_attributes(self)
46
47        self._cache_value(
48            'attributes',
49            apply_patch_to_config(instance_attributes, local_attributes),
50            memory_only=True,
51            debug=self.debug,
52        )
53
54    return self._attributes

Return a dictionary of a pipe's keys and parameters. These values are reflected directly from the pipes table of the instance.

parameters: Optional[Dict[str, Any]]
179@property
180def parameters(self) -> Optional[Dict[str, Any]]:
181    """
182    Return the parameters dictionary of the pipe.
183    """
184    return self.get_parameters(debug=self.debug)

Return the parameters dictionary of the pipe.

columns: Optional[Dict[str, str]]
196@property
197def columns(self) -> Union[Dict[str, str], None]:
198    """
199    Return the `columns` dictionary defined in `meerschaum.Pipe.parameters`.
200    """
201    cols = self.parameters.get('columns', {})
202    if not isinstance(cols, dict):
203        return {}
204    return {col_ix: col for col_ix, col in cols.items() if col and col_ix}

Return the columns dictionary defined in meerschaum.Pipe.parameters.

indices: Optional[Dict[str, Union[str, List[str]]]]
221@property
222def indices(self) -> Union[Dict[str, Union[str, List[str]]], None]:
223    """
224    Return the `indices` dictionary defined in `meerschaum.Pipe.parameters`.
225    """
226    _parameters = self.get_parameters(debug=self.debug)
227    indices_key = (
228        'indexes'
229        if 'indexes' in _parameters
230        else 'indices'
231    )
232
233    _indices = _parameters.get(indices_key, {})
234    _columns = self.columns
235    dt_col = _columns.get('datetime', None)
236    if not isinstance(_indices, dict):
237        _indices = {}
238    unique_cols = list(set((
239        [dt_col]
240        if dt_col
241        else []
242    ) + [
243        col
244        for col_ix, col in _columns.items()
245        if col and col_ix != 'datetime'
246    ]))
247    return {
248        **({'unique': unique_cols} if len(unique_cols) > 1 else {}),
249        **{col_ix: col for col_ix, col in _columns.items() if col},
250        **_indices
251    }

Return the indices dictionary defined in meerschaum.Pipe.parameters.

indexes: Optional[Dict[str, Union[str, List[str]]]]
254@property
255def indexes(self) -> Union[Dict[str, Union[str, List[str]]], None]:
256    """
257    Alias for `meerschaum.Pipe.indices`.
258    """
259    return self.indices
dtypes: Dict[str, Any]
310@property
311def dtypes(self) -> Dict[str, Any]:
312    """
313    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
314    """
315    return self.get_dtypes(refresh=False, debug=self.debug)

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

autoincrement: bool
418@property
419def autoincrement(self) -> bool:
420    """
421    Return the `autoincrement` parameter for the pipe.
422    """
423    return self.parameters.get('autoincrement', False)

Return the autoincrement parameter for the pipe.

autotime: bool
434@property
435def autotime(self) -> bool:
436    """
437    Return the `autotime` parameter for the pipe.
438    """
439    return self.parameters.get('autotime', False)

Return the autotime parameter for the pipe.

upsert: bool
385@property
386def upsert(self) -> bool:
387    """
388    Return whether `upsert` is set for the pipe.
389    """
390    return self.parameters.get('upsert', False)

Return whether upsert is set for the pipe.

static: bool
401@property
402def static(self) -> bool:
403    """
404    Return whether `static` is set for the pipe.
405    """
406    return self.parameters.get('static', False)

Return whether static is set for the pipe.

tzinfo: Optional[datetime.timezone]
450@property
451def tzinfo(self) -> Union[None, timezone]:
452    """
453    Return `timezone.utc` if the pipe is timezone-aware.
454    """
455    _tzinfo = self._get_cached_value('tzinfo', debug=self.debug)
456    if _tzinfo is not None:
457        return _tzinfo if _tzinfo != 'None' else None
458
459    _tzinfo = None
460    dt_col = self.columns.get('datetime', None)
461    dt_typ = str(self.dtypes.get(dt_col, 'datetime')) if dt_col else None
462    if self.autotime:
463        ts_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
464        ts_typ = self.dtypes.get(ts_col, 'datetime')
465        dt_typ = ts_typ
466
467    if dt_typ and 'utc' in dt_typ.lower() or dt_typ == 'datetime':
468        _tzinfo = timezone.utc
469
470    self._cache_value('tzinfo', (_tzinfo if _tzinfo is not None else 'None'), debug=self.debug)
471    return _tzinfo

Return timezone.utc if the pipe is timezone-aware.

enforce: bool
474@property
475def enforce(self) -> bool:
476    """
477    Return the `enforce` parameter for the pipe.
478    """
479    return self.parameters.get('enforce', True)

Return the enforce parameter for the pipe.

null_indices: bool
490@property
491def null_indices(self) -> bool:
492    """
493    Return the `null_indices` parameter for the pipe.
494    """
495    return self.parameters.get('null_indices', True)

Return the null_indices parameter for the pipe.

mixed_numerics: bool
506@property
507def mixed_numerics(self) -> bool:
508    """
509    Return the `mixed_numerics` parameter for the pipe.
510    """
511    return self.parameters.get('mixed_numerics', True)

Return the mixed_numerics parameter for the pipe.

def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
522def get_columns(self, *args: str, error: bool = False) -> Union[str, Tuple[str]]:
523    """
524    Check if the requested columns are defined.
525
526    Parameters
527    ----------
528    *args: str
529        The column names to be retrieved.
530
531    error: bool, default False
532        If `True`, raise an `Exception` if the specified column is not defined.
533
534    Returns
535    -------
536    A tuple of the same size of `args` or a `str` if `args` is a single argument.
537
538    Examples
539    --------
540    >>> pipe = mrsm.Pipe('test', 'test')
541    >>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
542    >>> pipe.get_columns('datetime', 'id')
543    ('dt', 'id')
544    >>> pipe.get_columns('value', error=True)
545    Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
546    """
547    from meerschaum.utils.warnings import error as _error
548    if not args:
549        args = tuple(self.columns.keys())
550    col_names = []
551    for col in args:
552        col_name = None
553        try:
554            col_name = self.columns[col]
555            if col_name is None and error:
556                _error(f"Please define the name of the '{col}' column for {self}.")
557        except Exception:
558            col_name = None
559        if col_name is None and error:
560            _error(f"Missing '{col}'" + f" column for {self}.")
561        col_names.append(col_name)
562    if len(col_names) == 1:
563        return col_names[0]
564    return tuple(col_names)

Check if the requested columns are defined.

Parameters
  • *args (str): The column names to be retrieved.
  • error (bool, default False): If True, raise an Exception if the specified column is not defined.
Returns
  • A tuple of the same size of args or a str if args is a single argument.
Examples
>>> pipe = mrsm.Pipe('test', 'test')
>>> pipe.columns = {'datetime': 'dt', 'id': 'id'}
>>> pipe.get_columns('datetime', 'id')
('dt', 'id')
>>> pipe.get_columns('value', error=True)
Exception:  🛑 Missing 'value' column for Pipe('test', 'test').
def get_columns_types( self, refresh: bool = False, debug: bool = False) -> Optional[Dict[str, str]]:
567def get_columns_types(
568    self,
569    refresh: bool = False,
570    debug: bool = False,
571) -> Union[Dict[str, str], None]:
572    """
573    Get a dictionary of a pipe's column names and their types.
574
575    Parameters
576    ----------
577    refresh: bool, default False
578        If `True`, invalidate the cache and fetch directly from the instance connector.
579
580    debug: bool, default False:
581        Verbosity toggle.
582
583    Returns
584    -------
585    A dictionary of column names (`str`) to column types (`str`).
586
587    Examples
588    --------
589    >>> pipe.get_columns_types()
590    {
591      'dt': 'TIMESTAMP WITH TIMEZONE',
592      'id': 'BIGINT',
593      'val': 'DOUBLE PRECISION',
594    }
595    >>>
596    """
597    from meerschaum.connectors import get_connector_plugin
598    from meerschaum.utils.dtypes import get_current_timestamp
599
600    now = get_current_timestamp('ms', as_int=True) / 1000
601    cache_seconds = (
602        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
603        if self.static
604        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
605    )
606    if refresh:
607        self._clear_cache_key('_columns_types_timestamp', debug=debug)
608        self._clear_cache_key('_columns_types', debug=debug)
609
610    _columns_types = self._get_cached_value('_columns_types', debug=debug)
611    if _columns_types:
612        columns_types_timestamp = self._get_cached_value('_columns_types_timestamp', debug=debug)
613        if columns_types_timestamp is not None:
614            delta = now - columns_types_timestamp
615            if delta < cache_seconds:
616                if debug:
617                    dprint(
618                        f"Returning cached `columns_types` for {self} "
619                        f"({round(delta, 2)} seconds old)."
620                    )
621                return _columns_types
622
623    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
624        _columns_types = (
625            self.instance_connector.get_pipe_columns_types(self, debug=debug)
626            if hasattr(self.instance_connector, 'get_pipe_columns_types')
627            else None
628        )
629
630    self._cache_value('_columns_types', _columns_types, debug=debug)
631    self._cache_value('_columns_types_timestamp', now, debug=debug)
632    return _columns_types or {}

Get a dictionary of a pipe's column names and their types.

Parameters
  • refresh (bool, default False): If True, invalidate the cache and fetch directly from the instance connector.
  • debug (bool, default False:): Verbosity toggle.
Returns
  • A dictionary of column names (str) to column types (str).
Examples
>>> pipe.get_columns_types()
{
  'dt': 'TIMESTAMP WITH TIMEZONE',
  'id': 'BIGINT',
  'val': 'DOUBLE PRECISION',
}
>>>
def get_columns_indices( self, debug: bool = False, refresh: bool = False) -> Dict[str, List[Dict[str, str]]]:
635def get_columns_indices(
636    self,
637    debug: bool = False,
638    refresh: bool = False,
639) -> Dict[str, List[Dict[str, str]]]:
640    """
641    Return a dictionary mapping columns to index information.
642    """
643    from meerschaum.connectors import get_connector_plugin
644    from meerschaum.utils.dtypes import get_current_timestamp
645
646    now = get_current_timestamp('ms', as_int=True) / 1000
647    cache_seconds = (
648        mrsm.get_config('pipes', 'static', 'static_schema_cache_seconds')
649        if self.static
650        else mrsm.get_config('pipes', 'dtypes', 'columns_types_cache_seconds')
651    )
652    if refresh:
653        self._clear_cache_key('_columns_indices_timestamp', debug=debug)
654        self._clear_cache_key('_columns_indices', debug=debug)
655
656    _columns_indices = self._get_cached_value('_columns_indices', debug=debug)
657
658    if _columns_indices:
659        columns_indices_timestamp = self._get_cached_value('_columns_indices_timestamp', debug=debug)
660        if columns_indices_timestamp is not None:
661            delta = now - columns_indices_timestamp
662            if delta < cache_seconds:
663                if debug:
664                    dprint(
665                        f"Returning cached `columns_indices` for {self} "
666                        f"({round(delta, 2)} seconds old)."
667                    )
668                return _columns_indices
669
670    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
671        _columns_indices = (
672            self.instance_connector.get_pipe_columns_indices(self, debug=debug)
673            if hasattr(self.instance_connector, 'get_pipe_columns_indices')
674            else None
675        )
676
677    self._cache_value('_columns_indices', _columns_indices, debug=debug)
678    self._cache_value('_columns_indices_timestamp', now, debug=debug)
679    return {k: v for k, v in _columns_indices.items() if k and v} or {}

Return a dictionary mapping columns to index information.

def get_indices(self) -> Dict[str, str]:
1086def get_indices(self) -> Dict[str, str]:
1087    """
1088    Return a dictionary mapping index keys to their names in the database.
1089
1090    Returns
1091    -------
1092    A dictionary of index keys to index names.
1093    """
1094    from meerschaum.connectors import get_connector_plugin
1095    with mrsm.Venv(get_connector_plugin(self.instance_connector)):
1096        if hasattr(self.instance_connector, 'get_pipe_index_names'):
1097            result = self.instance_connector.get_pipe_index_names(self)
1098        else:
1099            result = {}
1100    
1101    return result

Return a dictionary mapping index keys to their names in the database.

Returns
  • A dictionary of index keys to index names.
def get_parameters( self, apply_symlinks: bool = True, refresh: bool = False, debug: bool = False, _visited: Optional[set[Pipe]] = None) -> Dict[str, Any]:
 59def get_parameters(
 60    self,
 61    apply_symlinks: bool = True,
 62    refresh: bool = False,
 63    debug: bool = False,
 64    _visited: 'Optional[set[mrsm.Pipe]]' = None,
 65) -> Dict[str, Any]:
 66    """
 67    Return the `parameters` dictionary of the pipe.
 68
 69    Parameters
 70    ----------
 71    apply_symlinks: bool, default True
 72        If `True`, resolve references to parameters from other pipes.
 73
 74    refresh: bool, default False
 75        If `True`, pull the latest attributes for the pipe.
 76
 77    Returns
 78    -------
 79    The pipe's parameters dictionary.
 80    """
 81    from copy import deepcopy
 82    from meerschaum.config._patch import apply_patch_to_config
 83    from meerschaum.config._read_config import search_and_substitute_config
 84
 85    is_top_level = _visited is None
 86    if _visited is None:
 87        _visited = {self}
 88
 89    if refresh:
 90        _ = self._invalidate_cache(hard=True)
 91        ### Drop any memoized resolution so a later non-refresh call recomputes from fresh state.
 92        _ = self.__dict__.pop('_resolved_parameters_raw', None)
 93        _ = self.__dict__.pop('_resolved_parameters', None)
 94        _ = self.__dict__.pop('_resolved_parameters_symlinks', None)
 95
 96    raw_parameters = self.attributes.get('parameters', {})
 97    if not apply_symlinks:
 98        return raw_parameters
 99
100    ### Resolving references + `{{ Pipe() }}` / `MRSM{}` symlinks is pure-Python but expensive
101    ### (it walks reference pipes and may build connectors), and `get_parameters` is a hot path
102    ### hit by `.dtypes`, `.columns`, `.precision`, etc. Memoize the resolved result, keyed on the
103    ### identity of the raw parameters dict: every mutation path (`update_parameters`, the setter,
104    ### `edit`) reassigns `_attributes['parameters']` to a new object, so identity changing is a
105    ### reliable invalidation signal. Schema is *not* part of this — dynamic-schema freshness is
106    ### handled separately by `get_columns_types`' TTL cache, so this is safe for dynamic pipes.
107    ### Only memoize the top-level entry (not nested reference resolution, which threads `_visited`
108    ### for cycle detection) and only the default symlink-resolving, non-refreshing call.
109    can_memoize = is_top_level and not refresh
110    if can_memoize and self.__dict__.get('_resolved_parameters_raw', None) is raw_parameters:
111        self._symlinks = self.__dict__.get('_resolved_parameters_symlinks', {})
112        ### Return a copy so callers that mutate the result (e.g. `infer_dtypes(persist=True)`)
113        ### don't corrupt the memo.
114        return deepcopy(self.__dict__['_resolved_parameters'])
115
116    parameters = {}
117    for ref_pipe in self.references:
118        try:
119            if ref_pipe in _visited:
120                warn(f"Circular reference detected in {self}: chain involves {ref_pipe}.")
121                return search_and_substitute_config(raw_parameters)
122
123            _visited.add(ref_pipe)
124            if refresh:
125                _ = _cached_base_params.pop(ref_pipe, None)
126            base_params = _cached_base_params.get(ref_pipe, None)
127            if base_params is None:
128                base_params = ref_pipe.get_parameters(
129                    apply_symlinks=apply_symlinks,
130                    _visited=_visited,
131                    debug=debug,
132                )
133                _cached_base_params[ref_pipe] = base_params
134                if debug:
135                    dprint(f"base_params from {ref_pipe} for {self}:")
136                    mrsm.pprint(base_params)
137            else:
138                if debug:
139                    dprint(f"Using cached base_params from {ref_pipe} for {self}")
140        except Exception as e:
141            warn(f"Failed to resolve reference pipe for {self}: {e}")
142            base_params = {}
143
144        parameters = apply_patch_to_config(parameters, base_params)
145
146    parameters = apply_patch_to_config(parameters, raw_parameters)
147
148    from meerschaum.utils.pipes import replace_pipes_syntax
149    self._symlinks = {}
150
151    def recursive_replace(obj: Any, path: tuple) -> Any:
152        if isinstance(obj, dict):
153            return {k: recursive_replace(v, path + (k,)) for k, v in obj.items()}
154        if isinstance(obj, list):
155            return [recursive_replace(elem, path + (i,)) for i, elem in enumerate(obj)]
156        if isinstance(obj, str):
157            substituted_val = replace_pipes_syntax(obj, _pipe=self)
158            if substituted_val != obj:
159                self._symlinks[path] = {
160                    'original': obj,
161                    'substituted': substituted_val,
162                }
163            return substituted_val
164        return obj
165
166    resolved_parameters = search_and_substitute_config(recursive_replace(parameters, tuple()))
167
168    if can_memoize:
169        ### Hold a reference to the raw dict so its identity can't be reused by a freed object,
170        ### and stash the symlinks captured above alongside the resolved result.
171        self.__dict__['_resolved_parameters_raw'] = raw_parameters
172        self.__dict__['_resolved_parameters'] = resolved_parameters
173        self.__dict__['_resolved_parameters_symlinks'] = self._symlinks
174        return deepcopy(resolved_parameters)
175
176    return resolved_parameters

Return the parameters dictionary of the pipe.

Parameters
  • apply_symlinks (bool, default True): If True, resolve references to parameters from other pipes.
  • refresh (bool, default False): If True, pull the latest attributes for the pipe.
Returns
  • The pipe's parameters dictionary.
def get_dtypes( self, infer: bool = True, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
329def get_dtypes(
330    self,
331    infer: bool = True,
332    refresh: bool = False,
333    debug: bool = False,
334) -> Dict[str, Any]:
335    """
336    If defined, return the `dtypes` dictionary defined in `meerschaum.Pipe.parameters`.
337
338    Parameters
339    ----------
340    infer: bool, default True
341        If `True`, include the implicit existing dtypes.
342        Else only return the explicitly configured dtypes (e.g. `Pipe.parameters['dtypes']`).
343
344    refresh: bool, default False
345        If `True`, invalidate any cache and return the latest known dtypes.
346
347    Returns
348    -------
349    A dictionary mapping column names to dtypes.
350    """
351    from meerschaum.config._patch import apply_patch_to_config
352    from meerschaum.utils.dtypes import MRSM_ALIAS_DTYPES
353    parameters = self.get_parameters(refresh=refresh, debug=debug)
354    configured_dtypes = parameters.get('dtypes', {})
355    if debug:
356        dprint(f"Configured dtypes for {self}:")
357        mrsm.pprint(configured_dtypes)
358
359    remote_dtypes = (
360        self.infer_dtypes(persist=False, refresh=refresh, debug=debug)
361        if infer
362        else {}
363    )
364    if debug and infer:
365        dprint(f"Remote dtypes for {self}:")
366        mrsm.pprint(remote_dtypes)
367
368    patched_dtypes = apply_patch_to_config((remote_dtypes or {}), (configured_dtypes or {}))
369
370    dt_col = parameters.get('columns', {}).get('datetime', None)
371    primary_col = parameters.get('columns', {}).get('primary', None)
372    _dtypes = {
373        col: MRSM_ALIAS_DTYPES.get(typ, typ)
374        for col, typ in patched_dtypes.items()
375        if col and typ
376    }
377    if dt_col and dt_col not in configured_dtypes:
378        _dtypes[dt_col] = 'datetime'
379    if primary_col and parameters.get('autoincrement', False) and primary_col not in _dtypes:
380        _dtypes[primary_col] = 'int'
381
382    return _dtypes

If defined, return the dtypes dictionary defined in meerschaum.Pipe.parameters.

Parameters
  • infer (bool, default True): If True, include the implicit existing dtypes. Else only return the explicitly configured dtypes (e.g. Pipe.parameters['dtypes']).
  • refresh (bool, default False): If True, invalidate any cache and return the latest known dtypes.
Returns
  • A dictionary mapping column names to dtypes.
def update_parameters( self, parameters_patch: Dict[str, Any], persist: bool = True, debug: bool = False) -> Tuple[bool, str]:
1104def update_parameters(
1105    self,
1106    parameters_patch: Dict[str, Any],
1107    persist: bool = True,
1108    debug: bool = False,
1109) -> mrsm.SuccessTuple:
1110    """
1111    Apply a patch to a pipe's `parameters` dictionary.
1112
1113    Parameters
1114    ----------
1115    parameters_patch: Dict[str, Any]
1116        The patch to be applied to `Pipe.parameters`.
1117
1118    persist: bool, default True
1119        If `True`, call `Pipe.edit()` to persist the new parameters.
1120    """
1121    from meerschaum.config import apply_patch_to_config
1122    if 'parameters' not in self._attributes:
1123        self._attributes['parameters'] = {}
1124
1125    self._attributes['parameters'] = apply_patch_to_config(
1126        self._attributes['parameters'],
1127        parameters_patch,
1128    )
1129
1130    if self.temporary:
1131        persist = False
1132
1133    if not persist:
1134        return True, "Success"
1135
1136    return self.edit(debug=debug)

Apply a patch to a pipe's parameters dictionary.

Parameters
  • parameters_patch (Dict[str, Any]): The patch to be applied to Pipe.parameters.
  • persist (bool, default True): If True, call Pipe.edit() to persist the new parameters.
tags: Optional[List[str]]
287@property
288def tags(self) -> Union[List[str], None]:
289    """
290    If defined, return the `tags` list defined in `meerschaum.Pipe.parameters`.
291    """
292    return self.parameters.get('tags', [])

If defined, return the tags list defined in meerschaum.Pipe.parameters.

def get_id(self, **kw: Any) -> Union[int, str, NoneType]:
682def get_id(self, **kw: Any) -> Union[int, str, None]:
683    """
684    Fetch a pipe's ID from its instance connector.
685    If the pipe is not registered, return `None`.
686    """
687    if self.temporary:
688        return None
689
690    from meerschaum.utils.venv import Venv
691    from meerschaum.connectors import get_connector_plugin
692
693    with Venv(get_connector_plugin(self.instance_connector)):
694        if hasattr(self.instance_connector, 'get_pipe_id'):
695            return self.instance_connector.get_pipe_id(self, **kw)
696
697    return None

Fetch a pipe's ID from its instance connector. If the pipe is not registered, return None.

id: Union[int, str, uuid.UUID, NoneType]
700@property
701def id(self) -> Union[int, str, uuid.UUID, None]:
702    """
703    Fetch and cache a pipe's ID.
704    """
705    _id = self._get_cached_value('_id', debug=self.debug)
706    if _id is None:
707        _id = self.get_id(debug=self.debug)
708        if _id is not None:
709            self._cache_value('_id', _id, debug=self.debug)
710    return _id

Fetch and cache a pipe's ID.

def get_val_column(self, debug: bool = False) -> Optional[str]:
713def get_val_column(self, debug: bool = False) -> Union[str, None]:
714    """
715    Return the name of the value column if it's defined, otherwise make an educated guess.
716    If not set in the `columns` dictionary, return the first numeric column that is not
717    an ID or datetime column.
718    If none may be found, return `None`.
719
720    Parameters
721    ----------
722    debug: bool, default False:
723        Verbosity toggle.
724
725    Returns
726    -------
727    Either a string or `None`.
728    """
729    if debug:
730        dprint('Attempting to determine the value column...')
731    try:
732        val_name = self.get_columns('value')
733    except Exception:
734        val_name = None
735    if val_name is not None:
736        if debug:
737            dprint(f"Value column: {val_name}")
738        return val_name
739
740    cols = self.columns
741    if cols is None:
742        if debug:
743            dprint('No columns could be determined. Returning...')
744        return None
745    try:
746        dt_name = self.get_columns('datetime', error=False)
747    except Exception:
748        dt_name = None
749    try:
750        id_name = self.get_columns('id', errors=False)
751    except Exception:
752        id_name = None
753
754    if debug:
755        dprint(f"dt_name: {dt_name}")
756        dprint(f"id_name: {id_name}")
757
758    cols_types = self.get_columns_types(debug=debug)
759    if cols_types is None:
760        return None
761    if debug:
762        dprint(f"cols_types: {cols_types}")
763    if dt_name is not None:
764        cols_types.pop(dt_name, None)
765    if id_name is not None:
766        cols_types.pop(id_name, None)
767
768    candidates = []
769    candidate_keywords = {'float', 'double', 'precision', 'int', 'numeric',}
770    for search_term in candidate_keywords:
771        for col, typ in cols_types.items():
772            if search_term in typ.lower():
773                candidates.append(col)
774                break
775    if not candidates:
776        if debug:
777            dprint("No value column could be determined.")
778        return None
779
780    return candidates[0]

Return the name of the value column if it's defined, otherwise make an educated guess. If not set in the columns dictionary, return the first numeric column that is not an ID or datetime column. If none may be found, return None.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
  • Either a string or None.
parents: List[Pipe]
783@property
784def parents(self) -> List[mrsm.Pipe]:
785    """
786    Return a list of `meerschaum.Pipe` objects to be designated as parents.
787    """
788    _cached_parents = self.__dict__.get('_parents', None)
789    if _cached_parents is not None:
790        return _cached_parents
791
792    from meerschaum.utils.pipes import get_pipe_from_value
793    base_params = self.get_parameters()
794    key = 'parents' if 'parents' in base_params else 'parent'
795    parents_refs = base_params.get(key, None) or []
796    if isinstance(parents_refs, str) or isinstance(parents_refs, dict):
797        parents_refs = [parents_refs]
798
799    if not parents_refs:
800        return []
801
802    self._parents = [get_pipe_from_value(val, _pipe=self) for val in parents_refs]
803    return self._parents

Return a list of meerschaum.Pipe objects to be designated as parents.

parent: Optional[Pipe]
806@property
807def parent(self) -> Union[mrsm.Pipe, None]:
808    """
809    Return the first pipe in `self.parents` or `None`.
810    """
811    _parents = self.parents
812    if not _parents:
813        return None
814
815    return _parents[0]

Return the first pipe in self.parents or None.

children: List[Pipe]
851@property
852def children(self) -> List[mrsm.Pipe]:
853    """
854    Return a list of `meerschaum.Pipe` objects to be designated as children.
855    """
856    _cached_children = self.__dict__.get('_children', None)
857    if _cached_children is not None:
858        return _cached_children
859
860    from meerschaum.utils.pipes import get_pipe_from_value
861    base_params = self.get_parameters()
862    key = 'children' if 'children' in base_params else 'child'
863    children_refs = base_params.get(key, None) or []
864    if isinstance(children_refs, str) or isinstance(children_refs, dict):
865        children_refs = [children_refs]
866
867    if not children_refs:
868        return []
869
870    self._children = [get_pipe_from_value(val, _pipe=self) for val in children_refs]
871    return self._children

Return a list of meerschaum.Pipe objects to be designated as children.

child: Pipe | None
874@property
875def child(self) -> mrsm.Pipe | None:
876    """
877    Return the first pipe in `self.children` or None.
878    """
879    _children = self.children
880    if not _children:
881        return None
882
883    return _children[0]

Return the first pipe in self.children or None.

reference: Pipe | None
943@property
944def reference(self) -> mrsm.Pipe | None:
945    """
946    Return the first pipe in `self.references` or None.
947    """
948    _references = self.references
949    if not _references:
950        return None
951
952    return _references[0]

Return the first pipe in self.references or None.

references: List[Pipe]
920@property
921def references(self) -> List[mrsm.Pipe]:
922    """
923    Return a list of `meerschaum.Pipe` objects to be designated as references.
924    """
925    _cached_references = self.__dict__.get('_references', None)
926    if _cached_references is not None:
927        return _cached_references
928
929    from meerschaum.utils.pipes import get_pipe_from_value
930    base_params = self.get_parameters(apply_symlinks=False)
931    key = 'references' if 'references' in base_params else 'reference'
932    refs = base_params.get(key, None) or []
933    if isinstance(refs, str) or isinstance(refs, dict):
934        refs = [refs]
935
936    if not refs:
937        return []
938
939    self._references = [get_pipe_from_value(val, _pipe=self) for val in refs]
940    return self._references

Return a list of meerschaum.Pipe objects to be designated as references.

target: str
 990@property
 991def target(self) -> str:
 992    """
 993    The target table name.
 994    You can set the target name under on of the following keys
 995    (checked in this order):
 996      - `target`
 997      - `target_name`
 998      - `target_table`
 999      - `target_table_name`
1000    """
1001    cached_target = self.__dict__.get('_target', None)
1002    if cached_target:
1003        return cached_target
1004
1005    params = self.parameters
1006    target_val = params.get('target', None)
1007    if target_val:
1008        self.__dict__['_target'] = target_val
1009        return target_val
1010
1011    default_target = self._target_legacy()
1012    default_targets = {default_target}
1013    potential_keys = ('target_name', 'target_table', 'target_table_name')
1014    _target = None
1015    for k in potential_keys:
1016        if k in params:
1017            _target = params[k]
1018            break
1019
1020    _target = _target or default_target
1021
1022    if self.instance_connector.type == 'sql':
1023        from meerschaum.utils.sql import truncate_item_name
1024        truncated_target = truncate_item_name(_target, self.instance_connector.flavor)
1025        default_targets.add(truncated_target)
1026        warned_target = self.__dict__.get('_warned_target', False)
1027        if truncated_target != _target and not warned_target:
1028            if self.instance_connector.flavor not in ('oracle', 'mysql', 'mariadb'):
1029                warn(
1030                    f"The target '{_target}' is too long for '{self.instance_connector.flavor}', "
1031                    + f"will use {truncated_target} instead."
1032                )
1033            self.__dict__['_warned_target'] = True
1034            _target = truncated_target
1035
1036    if _target not in default_targets:
1037        self.target = _target
1038
1039    self.__dict__['_target'] = _target
1040    return _target

The target table name. You can set the target name under on of the following keys (checked in this order):

  • target
  • target_name
  • target_table
  • target_table_name
def guess_datetime(self) -> Optional[str]:
1064def guess_datetime(self) -> Union[str, None]:
1065    """
1066    Try to determine a pipe's datetime column.
1067    """
1068    _dtypes = self.dtypes
1069
1070    ### Abort if the user explictly disallows a datetime index.
1071    if 'datetime' in _dtypes:
1072        if _dtypes['datetime'] is None:
1073            return None
1074
1075    from meerschaum.utils.dtypes import are_dtypes_equal
1076    dt_cols = [
1077        col
1078        for col, typ in _dtypes.items()
1079        if are_dtypes_equal(typ, 'datetime')
1080    ]
1081    if not dt_cols:
1082        return None
1083    return dt_cols[0]

Try to determine a pipe's datetime column.

precision: Dict[str, Union[str, int]]
1228@property
1229def precision(self) -> Dict[str, Union[str, int]]:
1230    """
1231    Return the configured or detected precision.
1232    """
1233    return self.get_precision(debug=self.debug)

Return the configured or detected precision.

def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
1139def get_precision(self, debug: bool = False) -> Dict[str, Union[str, int]]:
1140    """
1141    Return the timestamp precision unit and interval for the `datetime` axis.
1142    """
1143    from meerschaum.utils.dtypes import (
1144        MRSM_PRECISION_UNITS_SCALARS,
1145        MRSM_PRECISION_UNITS_ALIASES,
1146        MRSM_PD_DTYPES,
1147        are_dtypes_equal,
1148    )
1149    from meerschaum._internal.static import STATIC_CONFIG
1150
1151    _precision = self._get_cached_value('precision', debug=debug)
1152    if _precision:
1153        if debug:
1154            dprint(f"Returning cached precision: {_precision}")
1155        return _precision
1156
1157    parameters = self.parameters
1158    _precision = parameters.get('precision', {})
1159    if isinstance(_precision, str):
1160        _precision = {'unit': _precision}
1161    default_precision_unit = STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
1162
1163    if not _precision:
1164
1165        dt_col = parameters.get('columns', {}).get('datetime', None)
1166        if not dt_col and self.autotime:
1167            dt_col = mrsm.get_config('pipes', 'autotime', 'column_name_if_datetime_missing')
1168        if not dt_col:
1169            if debug:
1170                dprint(f"No datetime axis, returning default precision '{default_precision_unit}'.")
1171            return {'unit': default_precision_unit}
1172
1173        dt_typ = self.dtypes.get(dt_col, 'datetime')
1174        if are_dtypes_equal(dt_typ, 'datetime'):
1175            if dt_typ == 'datetime':
1176                dt_typ = MRSM_PD_DTYPES['datetime']
1177                if debug:
1178                    dprint(f"Datetime type is `datetime`, assuming {dt_typ} precision.")
1179
1180            _precision = {
1181                'unit': (
1182                    dt_typ
1183                    .split('[', maxsplit=1)[-1]
1184                    .split(',', maxsplit=1)[0]
1185                    .split(' ', maxsplit=1)[0]
1186                ).rstrip(']')
1187            }
1188
1189            if debug:
1190                dprint(f"Extracted precision '{_precision['unit']}' from type '{dt_typ}'.")
1191
1192        elif are_dtypes_equal(dt_typ, 'int'):
1193            _precision = {
1194                'unit': (
1195                    'second'
1196                    if '32' in dt_typ
1197                    else default_precision_unit
1198                )
1199            }
1200        elif are_dtypes_equal(dt_typ, 'date'):
1201            if debug:
1202                dprint("Datetime axis is 'date', falling back to 'day' precision.")
1203            _precision = {'unit': 'day'}
1204
1205    precision_unit = _precision.get('unit', default_precision_unit)
1206    precision_interval = _precision.get('interval', None)
1207    true_precision_unit = MRSM_PRECISION_UNITS_ALIASES.get(precision_unit, precision_unit)
1208    if true_precision_unit is None:
1209        if debug:
1210            dprint(f"No precision could be determined, falling back to '{default_precision_unit}'.")
1211        true_precision_unit = default_precision_unit
1212
1213    if true_precision_unit not in MRSM_PRECISION_UNITS_SCALARS:
1214        from meerschaum.utils.misc import items_str
1215        raise ValueError(
1216            f"Invalid precision unit '{true_precision_unit}'.\n"
1217            "Accepted values are "
1218            f"{items_str(list(MRSM_PRECISION_UNITS_SCALARS) + list(MRSM_PRECISION_UNITS_ALIASES))}."
1219        )
1220
1221    _precision = {'unit': true_precision_unit}
1222    if precision_interval:
1223        _precision['interval'] = precision_interval
1224    self._cache_value('precision', _precision, debug=debug)
1225    return self._precision

Return the timestamp precision unit and interval for the datetime axis.

def show( self, nopretty: bool = False, debug: bool = False, **kw) -> Tuple[bool, str]:
12def show(
13    self,
14    nopretty: bool = False,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Show attributes of a Pipe.
20
21    Parameters
22    ----------
23    nopretty: bool, default False
24        If `True`, simply print the JSON of the pipe's attributes.
25
26    debug: bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success, message.
32
33    """
34    import json
35    from meerschaum.utils.formatting import (
36        pprint, make_header, ANSI, highlight_pipes, fill_ansi, get_console,
37    )
38    from meerschaum.utils.packages import import_rich, attempt_import
39    from meerschaum.utils.warnings import info
40    attributes_json = json.dumps(self.attributes)
41    if not nopretty:
42        _to_print = f"Attributes for {self}:"
43        if ANSI:
44            _to_print = fill_ansi(highlight_pipes(make_header(_to_print)), 'magenta')
45            print(_to_print)
46            rich = import_rich()
47            rich_json = attempt_import('rich.json')
48            get_console().print(rich_json.JSON(attributes_json))
49        else:
50            print(_to_print)
51    else:
52        print(attributes_json)
53
54    return True, "Success"

Show attributes of a Pipe.

Parameters
  • nopretty (bool, default False): If True, simply print the JSON of the pipe's attributes.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit( self, patch: bool = False, interactive: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 21def edit(
 22    self,
 23    patch: bool = False,
 24    interactive: bool = False,
 25    debug: bool = False,
 26    **kw: Any
 27) -> SuccessTuple:
 28    """
 29    Edit a Pipe's configuration.
 30
 31    Parameters
 32    ----------
 33    patch: bool, default False
 34        If `patch` is True, update parameters by cascading rather than overwriting.
 35    interactive: bool, default False
 36        If `True`, open an editor for the user to make changes to the pipe's YAML file.
 37    debug: bool, default False
 38        Verbosity toggle.
 39
 40    Returns
 41    -------
 42    A `SuccessTuple` of success, message.
 43
 44    """
 45    from meerschaum.utils.venv import Venv
 46    from meerschaum.connectors import get_connector_plugin
 47
 48    if self.temporary:
 49        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
 50
 51    self._invalidate_cache(hard=True, debug=debug)
 52
 53    if hasattr(self, '_symlinks'):
 54        from meerschaum.utils.misc import get_val_from_dict_path, set_val_in_dict_path
 55        for path, vals in self._symlinks.items():
 56            current_val = get_val_from_dict_path(self.parameters, path)
 57            if current_val == vals['substituted']:
 58                set_val_in_dict_path(self.parameters, path, vals['original'])
 59
 60    if not interactive:
 61        with Venv(get_connector_plugin(self.instance_connector)):
 62            return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)
 63
 64    import meerschaum.config.paths as paths
 65    from meerschaum.utils.misc import edit_file
 66    parameters_filename = str(self) + '.yaml'
 67    parameters_path = paths.PIPES_CACHE_RESOURCES_PATH / parameters_filename
 68
 69    from meerschaum.utils.yaml import yaml
 70
 71    edit_text = f"Edit the parameters for {self}"
 72    edit_top = '#' * (len(edit_text) + 4)
 73    edit_header = edit_top + f'\n# {edit_text} #\n' + edit_top + '\n\n'
 74
 75    from meerschaum.config import get_config
 76    parameters = dict(get_config('pipes', 'parameters', patch=True))
 77    from meerschaum.config._patch import apply_patch_to_config
 78    raw_parameters = self.attributes.get('parameters', {})
 79    parameters = apply_patch_to_config(parameters, raw_parameters)
 80
 81    ### write parameters to yaml file
 82    with open(parameters_path, 'w+') as f:
 83        f.write(edit_header)
 84        yaml.dump(parameters, stream=f, sort_keys=False)
 85
 86    ### only quit editing if yaml is valid
 87    editing = True
 88    while editing:
 89        edit_file(parameters_path)
 90        try:
 91            with open(parameters_path, 'r') as f:
 92                file_parameters = yaml.load(f.read())
 93        except Exception as e:
 94            from meerschaum.utils.warnings import warn
 95            warn(f"Invalid format defined for '{self}':\n\n{e}")
 96            input(f"Press [Enter] to correct the configuration for '{self}': ")
 97        else:
 98            editing = False
 99
100    self.parameters = file_parameters
101
102    if debug:
103        from meerschaum.utils.formatting import pprint
104        pprint(self.parameters)
105
106    with Venv(get_connector_plugin(self.instance_connector)):
107        return self.instance_connector.edit_pipe(self, patch=patch, debug=debug, **kw)

Edit a Pipe's configuration.

Parameters
  • patch (bool, default False): If patch is True, update parameters by cascading rather than overwriting.
  • interactive (bool, default False): If True, open an editor for the user to make changes to the pipe's YAML file.
  • debug (bool, default False): Verbosity toggle.
Returns
def edit_definition( self, yes: bool = False, noask: bool = False, force: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
110def edit_definition(
111    self,
112    yes: bool = False,
113    noask: bool = False,
114    force: bool = False,
115    debug : bool = False,
116    **kw : Any
117) -> SuccessTuple:
118    """
119    Edit a pipe's definition file and update its configuration.
120    **NOTE:** This function is interactive and should not be used in automated scripts!
121
122    Returns
123    -------
124    A `SuccessTuple` of success, message.
125
126    """
127    if self.temporary:
128        return False, "Cannot edit pipes created with `temporary=True` (read-only)."
129
130    from meerschaum.connectors import instance_types
131    if (self.connector is None or isinstance(self.connector, str)) or self.connector.type not in instance_types:
132        return self.edit(interactive=True, debug=debug, **kw)
133
134    import json
135    from meerschaum.utils.warnings import info, warn
136    from meerschaum.utils.debug import dprint
137    from meerschaum.config._patch import apply_patch_to_config
138    from meerschaum.utils.misc import edit_file
139
140    _parameters = self.parameters
141    if 'fetch' not in _parameters:
142        _parameters['fetch'] = {}
143
144    def _edit_api():
145        from meerschaum.utils.prompt import prompt, yes_no
146        info(
147            f"Please enter the keys of the source pipe from '{self.connector}'.\n" +
148            "Type 'None' for None, or empty when there is no default. Press [CTRL+C] to skip."
149        )
150
151        _keys = { 'connector_keys' : None, 'metric_key' : None, 'location_key' : None }
152        for k in _keys:
153            _keys[k] = _parameters['fetch'].get(k, None)
154
155        for k, v in _keys.items():
156            try:
157                _keys[k] = prompt(k.capitalize().replace('_', ' ') + ':', icon=True, default=v)
158            except KeyboardInterrupt:
159                continue
160            if _keys[k] in ('', 'None', '\'None\'', '[None]'):
161                _keys[k] = None
162
163        _parameters['fetch'] = apply_patch_to_config(_parameters['fetch'], _keys)
164
165        info("You may optionally specify additional filter parameters as JSON.")
166        print("  Parameters are translated into a 'WHERE x AND y' clause, and lists are IN clauses.")
167        print("  For example, the following JSON would correspond to 'WHERE x = 1 AND y IN (2, 3)':")
168        print(json.dumps({'x': 1, 'y': [2, 3]}, indent=2, separators=(',', ': ')))
169        if force or yes_no(
170            "Would you like to add additional filter parameters?",
171            yes=yes, noask=noask
172        ):
173            import meerschaum.config.paths as paths
174            definition_filename = str(self) + '.json'
175            definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename
176            try:
177                definition_path.touch()
178                with open(definition_path, 'w+') as f:
179                    json.dump(_parameters.get('fetch', {}).get('params', {}), f, indent=2)
180            except Exception as e:
181                return False, f"Failed writing file '{definition_path}':\n" + str(e)
182
183            _params = None
184            while True:
185                edit_file(definition_path)
186                try:
187                    with open(definition_path, 'r') as f:
188                        _params = json.load(f)
189                except Exception as e:
190                    warn(f'Failed to read parameters JSON:\n{e}', stack=False)
191                    if force or yes_no(
192                        "Would you like to try again?\n  "
193                        + "If not, the parameters JSON file will be ignored.",
194                        noask=noask, yes=yes
195                    ):
196                        continue
197                    _params = None
198                break
199            if _params is not None:
200                if 'fetch' not in _parameters:
201                    _parameters['fetch'] = {}
202                _parameters['fetch']['params'] = _params
203
204        self.parameters = _parameters
205        return True, "Success"
206
207    def _edit_sql():
208        import textwrap
209        import meerschaum.config.paths as paths
210        from meerschaum.utils.misc import edit_file
211        definition_filename = str(self) + '.sql'
212        definition_path = paths.PIPES_CACHE_RESOURCES_PATH / definition_filename
213
214        sql_definition = _parameters['fetch'].get('definition', None)
215        if sql_definition is None:
216            sql_definition = ''
217        sql_definition = textwrap.dedent(sql_definition).lstrip()
218
219        try:
220            definition_path.touch()
221            with open(definition_path, 'w+') as f:
222                f.write(sql_definition)
223        except Exception as e:
224            return False, f"Failed writing file '{definition_path}':\n" + str(e)
225
226        edit_file(definition_path)
227        try:
228            with open(definition_path, 'r', encoding='utf-8') as f:
229                file_definition = f.read()
230        except Exception as e:
231            return False, f"Failed reading file '{definition_path}':\n" + str(e)
232
233        if sql_definition == file_definition:
234            return False, f"No changes made to definition for {self}."
235
236        if ' ' not in file_definition:
237            return False, f"Invalid SQL definition for {self}."
238
239        if debug:
240            dprint("Read SQL definition:\n\n" + file_definition)
241        _parameters['fetch']['definition'] = file_definition
242        self.parameters = _parameters
243        return True, "Success"
244
245    locals()['_edit_' + str(self.connector.type)]()
246    return self.edit(interactive=False, debug=debug, **kw)

Edit a pipe's definition file and update its configuration. NOTE: This function is interactive and should not be used in automated scripts!

Returns
def update(self, *args, **kw) -> Tuple[bool, str]:
13def update(self, *args, **kw) -> SuccessTuple:
14    """
15    Update a pipe's parameters in its instance.
16    """
17    kw['interactive'] = False
18    return self.edit(*args, **kw)

Update a pipe's parameters in its instance.

def sync( self, df: Union[pandas.DataFrame, Dict[str, List[Any]], List[Dict[str, Any]], str, meerschaum.core.Pipe._sync.InferFetch] = <class 'meerschaum.core.Pipe._sync.InferFetch'>, begin: Union[datetime.datetime, int, str, NoneType] = '', end: Union[datetime.datetime, int, NoneType] = None, force: bool = False, retries: int = 10, min_seconds: int = 1, check_existing: bool = True, enforce_dtypes: bool = True, blocking: bool = True, workers: Optional[int] = None, callback: Optional[Callable[[Tuple[bool, str]], Any]] = None, error_callback: Optional[Callable[[Exception], Any]] = None, chunksize: Optional[int] = -1, sync_chunks: bool = True, debug: bool = False, _inplace: bool = True, **kw: Any) -> Tuple[bool, str]:
 41def sync(
 42    self,
 43    df: Union[
 44        pd.DataFrame,
 45        Dict[str, List[Any]],
 46        List[Dict[str, Any]],
 47        str,
 48        InferFetch
 49    ] = InferFetch,
 50    begin: Union[datetime, int, str, None] = '',
 51    end: Union[datetime, int, None] = None,
 52    force: bool = False,
 53    retries: int = 10,
 54    min_seconds: int = 1,
 55    check_existing: bool = True,
 56    enforce_dtypes: bool = True,
 57    blocking: bool = True,
 58    workers: Optional[int] = None,
 59    callback: Optional[Callable[[Tuple[bool, str]], Any]] = None,
 60    error_callback: Optional[Callable[[Exception], Any]] = None,
 61    chunksize: Optional[int] = -1,
 62    sync_chunks: bool = True,
 63    debug: bool = False,
 64    _inplace: bool = True,
 65    **kw: Any
 66) -> SuccessTuple:
 67    """
 68    Fetch new data from the source and update the pipe's table with new data.
 69    
 70    Get new remote data via fetch, get existing data in the same time period,
 71    and merge the two, only keeping the unseen data.
 72
 73    Parameters
 74    ----------
 75    df: Union[None, pd.DataFrame, Dict[str, List[Any]]], default None
 76        An optional DataFrame to sync into the pipe. Defaults to `None`.
 77        If `df` is a string, it will be parsed via `meerschaum.utils.dataframe.parse_simple_lines()`.
 78
 79    begin: Union[datetime, int, str, None], default ''
 80        Optionally specify the earliest datetime to search for data.
 81
 82    end: Union[datetime, int, str, None], default None
 83        Optionally specify the latest datetime to search for data.
 84
 85    force: bool, default False
 86        If `True`, keep trying to sync untul `retries` attempts.
 87
 88    retries: int, default 10
 89        If `force`, how many attempts to try syncing before declaring failure.
 90
 91    min_seconds: Union[int, float], default 1
 92        If `force`, how many seconds to sleep between retries. Defaults to `1`.
 93
 94    check_existing: bool, default True
 95        If `True`, pull and diff with existing data from the pipe.
 96
 97    enforce_dtypes: bool, default True
 98        If `True`, enforce dtypes on incoming data.
 99        Set this to `False` if the incoming rows are expected to be of the correct dtypes.
100
101    blocking: bool, default True
102        If `True`, wait for sync to finish and return its result, otherwise
103        asyncronously sync (oxymoron?) and return success. Defaults to `True`.
104        Only intended for specific scenarios.
105
106    workers: Optional[int], default None
107        If provided and the instance connector is thread-safe
108        (`pipe.instance_connector.IS_THREAD_SAFE is True`),
109        limit concurrent sync to this many threads.
110
111    callback: Optional[Callable[[Tuple[bool, str]], Any]], default None
112        Callback function which expects a SuccessTuple as input.
113        Only applies when `blocking=False`.
114
115    error_callback: Optional[Callable[[Exception], Any]], default None
116        Callback function which expects an Exception as input.
117        Only applies when `blocking=False`.
118
119    chunksize: int, default -1
120        Specify the number of rows to sync per chunk.
121        If `-1`, resort to system configuration (default is `900`).
122        A `chunksize` of `None` will sync all rows in one transaction.
123
124    sync_chunks: bool, default True
125        If possible, sync chunks while fetching them into memory.
126
127    debug: bool, default False
128        Verbosity toggle. Defaults to False.
129
130    Returns
131    -------
132    A `SuccessTuple` of success (`bool`) and message (`str`).
133    """
134    from meerschaum.utils.debug import dprint, _checkpoint
135    from meerschaum.utils.formatting import get_console
136    from meerschaum.utils.venv import Venv
137    from meerschaum.connectors import get_connector_plugin
138    from meerschaum.utils.misc import df_is_chunk_generator, filter_keywords, filter_arguments
139    from meerschaum.utils.pool import get_pool
140    from meerschaum.config import get_config
141    from meerschaum.utils.dtypes import are_dtypes_equal, get_current_timestamp
142
143    if (callback is not None or error_callback is not None) and blocking:
144        warn("Callback functions are only executed when blocking = False. Ignoring...")
145
146    _checkpoint(_total=2, **kw)
147
148    if chunksize == 0:
149        chunksize = None
150        sync_chunks = False
151
152    begin, end = self.parse_date_bounds(begin, end)
153    kw.update({
154        'begin': begin,
155        'end': end,
156        'force': force,
157        'retries': retries,
158        'min_seconds': min_seconds,
159        'check_existing': check_existing,
160        'blocking': blocking,
161        'workers': workers,
162        'callback': callback,
163        'error_callback': error_callback,
164        'sync_chunks': sync_chunks,
165        'chunksize': chunksize,
166        'safe_copy': True,
167    })
168
169    self._invalidate_cache(debug=debug)
170    self._cache_value('sync_ts', get_current_timestamp('ms'), debug=debug)
171
172    def _sync(
173        p: mrsm.Pipe,
174        df: Union[
175            'pd.DataFrame',
176            Dict[str, List[Any]],
177            List[Dict[str, Any]],
178            str,
179            InferFetch
180        ] = InferFetch,
181    ) -> SuccessTuple:
182        if df is None:
183            p._invalidate_cache(debug=debug)
184            return (
185                False,
186                f"You passed `None` instead of data into `sync()` for {p}.\n"
187                + "Omit the DataFrame to infer fetching.",
188            )
189        ### Ensure that Pipe is registered.
190        if not p.temporary and p.id is None:
191            ### NOTE: This may trigger an interactive session for plugins!
192            register_success, register_msg = p.register(debug=debug)
193            if not register_success:
194                if 'already' not in register_msg:
195                    p._invalidate_cache(debug=debug)
196                    return register_success, register_msg
197
198        if isinstance(df, str):
199            from meerschaum.utils.dataframe import parse_simple_lines
200            df = parse_simple_lines(df)
201
202        ### If connector is a plugin with a `sync()` method, return that instead.
203        ### If the plugin does not have a `sync()` method but does have a `fetch()` method,
204        ### use that instead.
205        ### NOTE: The DataFrame must be omitted for the plugin sync method to apply.
206        ### If a DataFrame is provided, continue as expected.
207        if hasattr(df, 'MRSM_INFER_FETCH'):
208            try:
209                if isinstance(p.connector, str):
210                    if ':' not in p.connector_keys:
211                        return True, f"{p} does not support fetching; nothing to do."
212
213                    msg = f"{p} does not have a valid connector."
214                    if p.connector_keys.startswith('plugin:'):
215                        msg += f"\n    Perhaps {p.connector_keys} has a syntax error?"
216                    p._invalidate_cache(debug=debug)
217                    return False, msg
218            except Exception:
219                p._invalidate_cache(debug=debug)
220                return False, f"Unable to create the connector for {p}."
221
222            ### Sync in place if possible.
223            if (
224                str(self.connector) == str(self.instance_connector)
225                and 
226                hasattr(self.instance_connector, 'sync_pipe_inplace')
227                and
228                _inplace
229                and
230                get_config('system', 'experimental', 'inplace_sync')
231            ):
232                with Venv(get_connector_plugin(self.instance_connector)):
233                    p._invalidate_cache(debug=debug)
234                    _args, _kwargs = filter_arguments(
235                        p.instance_connector.sync_pipe_inplace,
236                        p,
237                        debug=debug,
238                        **kw
239                    )
240                    return self.instance_connector.sync_pipe_inplace(
241                        *_args,
242                        **_kwargs
243                    )
244
245            ### Activate and invoke `sync(pipe)` for plugin connectors with `sync` methods.
246            try:
247                if getattr(p.connector, 'sync', None) is not None:
248                    with Venv(get_connector_plugin(p.connector), debug=debug):
249                        _args, _kwargs = filter_arguments(
250                            p.connector.sync,
251                            p,
252                            debug=debug,
253                            **kw
254                        )
255                        return_tuple = p.connector.sync(*_args, **_kwargs)
256                    p._invalidate_cache(debug=debug)
257                    if not isinstance(return_tuple, tuple):
258                        return_tuple = (
259                            False,
260                            f"Plugin '{p.connector.label}' returned non-tuple value: {return_tuple}"
261                        )
262                    return return_tuple
263
264            except Exception as e:
265                get_console().print_exception()
266                msg = f"Failed to sync {p} with exception: '" + str(e) + "'"
267                if debug:
268                    error(msg, silent=False)
269                p._invalidate_cache(debug=debug)
270                return False, msg
271
272            ### Fetch the dataframe from the connector's `fetch()` method.
273            try:
274                with Venv(get_connector_plugin(p.connector), debug=debug):
275                    df = p.fetch(
276                        **filter_keywords(
277                            p.fetch,
278                            debug=debug,
279                            **kw
280                        )
281                    )
282                    kw['safe_copy'] = False
283            except Exception as e:
284                get_console().print_exception(
285                    suppress=[
286                        'meerschaum/core/Pipe/_sync.py',
287                        'meerschaum/core/Pipe/_fetch.py',
288                    ]
289                )
290                msg = f"Failed to fetch data from {p.connector}:\n    {e}"
291                df = None
292
293            if df is None:
294                p._invalidate_cache(debug=debug)
295                return False, f"No data were fetched for {p}."
296
297            if isinstance(df, list):
298                if len(df) == 0:
299                    return True, f"No new rows were returned for {p}."
300
301                ### May be a chunk hook results list.
302                if isinstance(df[0], tuple):
303                    success = all([_success for _success, _ in df])
304                    message = '\n'.join([_message for _, _message in df])
305                    return success, message
306
307            if df is True:
308                p._invalidate_cache(debug=debug)
309                return True, f"{p} is being synced in parallel."
310
311        ### CHECKPOINT: Retrieved the DataFrame.
312        _checkpoint(**kw)
313
314        ### Allow for dataframe generators or iterables.
315        if df_is_chunk_generator(df):
316            kw['workers'] = p.get_num_workers(kw.get('workers', None))
317            dt_col = p.columns.get('datetime', None)
318            pool = get_pool(workers=kw.get('workers', 1))
319            if debug:
320                dprint(f"Received {type(df)}. Attempting to sync first chunk...")
321
322            try:
323                chunk = next(df)
324            except StopIteration:
325                return True, "Received an empty generator; nothing to do."
326
327            chunk_success, chunk_msg = _sync(p, chunk)
328            chunk_msg = '\n' + self._get_chunk_label(chunk, dt_col) + '\n' + chunk_msg
329            if not chunk_success:
330                return chunk_success, f"Unable to sync initial chunk for {p}:\n{chunk_msg}"
331            if debug:
332                dprint("Successfully synced the first chunk, attemping the rest...")
333
334            def _process_chunk(_chunk):
335                _chunk_attempts = 0
336                _max_chunk_attempts = 3
337                while _chunk_attempts < _max_chunk_attempts:
338                    try:
339                        _chunk_success, _chunk_msg = _sync(p, _chunk)
340                    except Exception as e:
341                        _chunk_success, _chunk_msg = False, str(e)
342                    if _chunk_success:
343                        break
344                    _chunk_attempts += 1
345                    _sleep_seconds = _chunk_attempts ** 2
346                    warn(
347                        (
348                            f"Failed to sync chunk to {self} "
349                            + f"(attempt {_chunk_attempts} / {_max_chunk_attempts}).\n"
350                            + f"Sleeping for {_sleep_seconds} second"
351                            + ('s' if _sleep_seconds != 1 else '')
352                            + f":\n{_chunk_msg}"
353                        ),
354                        stack=False,
355                    )
356                    time.sleep(_sleep_seconds)
357
358                num_rows_str = (
359                    f"{num_rows:,} rows"
360                    if (num_rows := len(_chunk)) != 1
361                    else f"{num_rows} row"
362                )
363                _chunk_msg = (
364                    (
365                        "Synced"
366                        if _chunk_success
367                        else "Failed to sync"
368                    ) + f" a chunk ({num_rows_str}) to {p}:\n"
369                    + self._get_chunk_label(_chunk, dt_col)
370                    + '\n'
371                    + _chunk_msg
372                )
373
374                mrsm.pprint((_chunk_success, _chunk_msg), calm=True)
375                return _chunk_success, _chunk_msg
376
377            results = sorted(
378                [(chunk_success, chunk_msg)] + (
379                    list(pool.imap(_process_chunk, df))
380                    if (
381                        not df_is_chunk_generator(chunk)  # Handle nested generators.
382                        and kw.get('workers', 1) != 1
383                    )
384                    else list(
385                        _process_chunk(_child_chunks)
386                        for _child_chunks in df
387                    )
388                )
389            )
390            chunk_messages = [chunk_msg for _, chunk_msg in results]
391            success_bools = [chunk_success for chunk_success, _ in results]
392            num_successes = len([chunk_success for chunk_success, _ in results if chunk_success])
393            num_failures = len([chunk_success for chunk_success, _ in results if not chunk_success])
394            success = all(success_bools)
395            msg = (
396                'Synced '
397                + f'{len(chunk_messages):,} chunk'
398                + ('s' if len(chunk_messages) != 1 else '')
399                + f' to {p}\n({num_successes} succeeded, {num_failures} failed):\n\n'
400                + '\n\n'.join(chunk_messages).lstrip().rstrip()
401            ).lstrip().rstrip()
402            return success, msg
403
404        ### Cast to a dataframe and ensure datatypes are what we expect.
405        dtypes = p.get_dtypes(debug=debug)
406        df = p.enforce_dtypes(
407            df,
408            chunksize=chunksize,
409            enforce=enforce_dtypes,
410            dtypes=dtypes,
411            debug=debug,
412        )
413        if p.autotime:
414            dt_col = p.columns.get('datetime', None)
415            ts_col = dt_col or mrsm.get_config(
416                'pipes', 'autotime', 'column_name_if_datetime_missing'
417            )
418            ts_typ = dtypes.get(ts_col, 'datetime') if ts_col else 'datetime'
419            if ts_col and hasattr(df, 'columns') and ts_col not in df.columns:
420                precision = p.get_precision(debug=debug)
421                now = get_current_timestamp(
422                    precision_unit=precision.get(
423                        'unit',
424                        STATIC_CONFIG['dtypes']['datetime']['default_precision_unit']
425                    ),
426                    precision_interval=precision.get('interval', 1),
427                    round_to=(precision.get('round_to', 'down')),
428                    as_int=(are_dtypes_equal(ts_typ, 'int')),
429                )
430                if debug:
431                    dprint(f"Adding current timestamp to dataframe synced to {p}: {now}")
432
433                df[ts_col] = now
434                kw['check_existing'] = dt_col is not None
435
436        ### Capture special columns.
437        capture_success, capture_msg = self._persist_new_special_columns(
438            df,
439            dtypes=dtypes,
440            debug=debug,
441        )
442        if not capture_success:
443            warn(f"Failed to capture new special columns for {self}:\n{capture_msg}")
444
445        if debug:
446            dprint(
447                "DataFrame to sync:\n"
448                + (
449                    str(df)[:255]
450                    + '...'
451                    if len(str(df)) >= 256
452                    else str(df)
453                ),
454                **kw
455            )
456
457        ### if force, continue to sync until success
458        return_tuple = False, f"Did not sync {p}."
459        run = True
460        _retries = 1
461        while run:
462            with Venv(get_connector_plugin(self.instance_connector)):
463                return_tuple = p.instance_connector.sync_pipe(
464                    pipe=p,
465                    df=df,
466                    debug=debug,
467                    **kw
468                )
469            _retries += 1
470            run = (not return_tuple[0]) and force and _retries <= retries
471            if run and debug:
472                dprint(f"Syncing failed for {p}. Attempt ( {_retries} / {retries} )", **kw)
473                dprint(f"Sleeping for {min_seconds} seconds...", **kw)
474                time.sleep(min_seconds)
475            if _retries > retries:
476                warn(
477                    f"Unable to sync {p} within {retries} attempt" +
478                        ("s" if retries != 1 else "") + "!"
479                )
480
481        ### CHECKPOINT: Finished syncing.
482        _checkpoint(**kw)
483        p._invalidate_cache(debug=debug)
484
485        ### Automatically apply a compression policy if the pipe is configured for compression.
486        if return_tuple[0] and p.parameters.get('compress', False):
487            if hasattr(p.instance_connector, 'apply_compression_policy'):
488                try:
489                    with Venv(get_connector_plugin(p.instance_connector)):
490                        compress_success, compress_msg = (
491                            p.instance_connector.apply_compression_policy(p, debug=debug)
492                        )
493                    if not compress_success and debug:
494                        dprint(f"Could not apply compression policy to {p}:\n{compress_msg}")
495                except Exception as compress_e:
496                    warn(
497                        f"Failed to apply compression policy to {p}:\n{compress_e}",
498                        stack=False,
499                    )
500
501        return return_tuple
502
503    if blocking:
504        return _sync(self, df=df)
505
506    from meerschaum.utils.threading import Thread
507    def default_callback(result_tuple: SuccessTuple):
508        dprint(f"Asynchronous result from {self}: {result_tuple}", **kw)
509
510    def default_error_callback(x: Exception):
511        dprint(f"Error received for {self}: {x}", **kw)
512
513    if callback is None and debug:
514        callback = default_callback
515    if error_callback is None and debug:
516        error_callback = default_error_callback
517    try:
518        thread = Thread(
519            target=_sync,
520            args=(self,),
521            kwargs={'df': df},
522            daemon=False,
523            callback=callback,
524            error_callback=error_callback,
525        )
526        thread.start()
527    except Exception as e:
528        self._invalidate_cache(debug=debug)
529        return False, str(e)
530
531    self._invalidate_cache(debug=debug)
532    return True, f"Spawned asyncronous sync for {self}."

Fetch new data from the source and update the pipe's table with new data.

Get new remote data via fetch, get existing data in the same time period, and merge the two, only keeping the unseen data.

Parameters
  • df (Union[None, pd.DataFrame, Dict[str, List[Any]]], default None): An optional DataFrame to sync into the pipe. Defaults to None. If df is a string, it will be parsed via meerschaum.utils.dataframe.parse_simple_lines().
  • begin (Union[datetime, int, str, None], default ''): Optionally specify the earliest datetime to search for data.
  • end (Union[datetime, int, str, None], default None): Optionally specify the latest datetime to search for data.
  • force (bool, default False): If True, keep trying to sync untul retries attempts.
  • retries (int, default 10): If force, how many attempts to try syncing before declaring failure.
  • min_seconds (Union[int, float], default 1): If force, how many seconds to sleep between retries. Defaults to 1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe.
  • enforce_dtypes (bool, default True): If True, enforce dtypes on incoming data. Set this to False if the incoming rows are expected to be of the correct dtypes.
  • blocking (bool, default True): If True, wait for sync to finish and return its result, otherwise asyncronously sync (oxymoron?) and return success. Defaults to True. Only intended for specific scenarios.
  • workers (Optional[int], default None): If provided and the instance connector is thread-safe (pipe.instance_connector.IS_THREAD_SAFE is True), limit concurrent sync to this many threads.
  • callback (Optional[Callable[[Tuple[bool, str]], Any]], default None): Callback function which expects a SuccessTuple as input. Only applies when blocking=False.
  • error_callback (Optional[Callable[[Exception], Any]], default None): Callback function which expects an Exception as input. Only applies when blocking=False.
  • chunksize (int, default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction.
  • sync_chunks (bool, default True): If possible, sync chunks while fetching them into memory.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def get_sync_time( self, params: Optional[Dict[str, Any]] = None, newest: bool = True, apply_backtrack_interval: bool = False, remote: bool = False, round_down: bool = False, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
535def get_sync_time(
536    self,
537    params: Optional[Dict[str, Any]] = None,
538    newest: bool = True,
539    apply_backtrack_interval: bool = False,
540    remote: bool = False,
541    round_down: bool = False,
542    debug: bool = False
543) -> Union['datetime', int, None]:
544    """
545    Get the most recent datetime value for a Pipe.
546
547    Parameters
548    ----------
549    params: Optional[Dict[str, Any]], default None
550        Dictionary to build a WHERE clause for a specific column.
551        See `meerschaum.utils.sql.build_where`.
552
553    newest: bool, default True
554        If `True`, get the most recent datetime (honoring `params`).
555        If `False`, get the oldest datetime (`ASC` instead of `DESC`).
556
557    apply_backtrack_interval: bool, default False
558        If `True`, subtract the backtrack interval from the sync time.
559
560    remote: bool, default False
561        If `True` and the instance connector supports it, return the sync time
562        for the remote table definition.
563
564    round_down: bool, default False
565        If `True`, round down the datetime value to the nearest minute.
566
567    debug: bool, default False
568        Verbosity toggle.
569
570    Returns
571    -------
572    A `datetime` or int, if the pipe exists, otherwise `None`.
573
574    """
575    from meerschaum.utils.venv import Venv
576    from meerschaum.connectors import get_connector_plugin
577    from meerschaum.utils.misc import filter_keywords
578    from meerschaum.utils.dtypes import round_time
579    from meerschaum.utils.warnings import warn
580
581    if not self.columns.get('datetime', None):
582        return None
583
584    connector = self.instance_connector if not remote else self.connector
585    if isinstance(connector, str) or connector is None:
586        return None
587
588    with Venv(get_connector_plugin(connector)):
589        if not hasattr(connector, 'get_sync_time'):
590            warn(
591                f"Connectors of type '{connector.type}' "
592                "do not implement `get_sync_time().",
593                stack=False,
594            )
595            return None
596        sync_time = connector.get_sync_time(
597            self,
598            **filter_keywords(
599                connector.get_sync_time,
600                params=params,
601                newest=newest,
602                remote=remote,
603                debug=debug,
604            )
605        )
606
607    if round_down and isinstance(sync_time, datetime):
608        sync_time = round_time(sync_time, timedelta(minutes=1))
609
610    if apply_backtrack_interval and sync_time is not None:
611        backtrack_interval = self.get_backtrack_interval(debug=debug)
612        try:
613            sync_time -= backtrack_interval
614        except Exception as e:
615            warn(f"Failed to apply backtrack interval:\n{e}")
616
617    return self.parse_date_bounds(sync_time)

Get the most recent datetime value for a Pipe.

Parameters
  • params (Optional[Dict[str, Any]], default None): Dictionary to build a WHERE clause for a specific column. See meerschaum.utils.sql.build_where.
  • newest (bool, default True): If True, get the most recent datetime (honoring params). If False, get the oldest datetime (ASC instead of DESC).
  • apply_backtrack_interval (bool, default False): If True, subtract the backtrack interval from the sync time.
  • remote (bool, default False): If True and the instance connector supports it, return the sync time for the remote table definition.
  • round_down (bool, default False): If True, round down the datetime value to the nearest minute.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A datetime or int, if the pipe exists, otherwise None.
def exists(self, debug: bool = False) -> bool:
620def exists(
621    self,
622    debug: bool = False
623) -> bool:
624    """
625    See if a Pipe's table exists.
626
627    Parameters
628    ----------
629    debug: bool, default False
630        Verbosity toggle.
631
632    Returns
633    -------
634    A `bool` corresponding to whether a pipe's underlying table exists.
635
636    """
637    from meerschaum.utils.venv import Venv
638    from meerschaum.connectors import get_connector_plugin
639    from meerschaum.utils.debug import dprint
640    from meerschaum.utils.dtypes import get_current_timestamp
641    now = get_current_timestamp('ms', as_int=True) / 1000
642    cache_seconds = mrsm.get_config('pipes', 'sync', 'exists_cache_seconds')
643
644    _exists = self._get_cached_value('_exists', debug=debug)
645    if _exists:
646        exists_timestamp = self._get_cached_value('_exists_timestamp', debug=debug)
647        if exists_timestamp is not None:
648            delta = now - exists_timestamp
649            if delta < cache_seconds:
650                if debug:
651                    dprint(f"Returning cached `exists` for {self} ({round(delta, 2)} seconds old).")
652                return _exists
653
654    with Venv(get_connector_plugin(self.instance_connector)):
655        _exists = (
656            self.instance_connector.pipe_exists(pipe=self, debug=debug)
657            if hasattr(self.instance_connector, 'pipe_exists')
658            else False
659        )
660
661    self._cache_value('_exists', _exists, debug=debug)
662    self._cache_value('_exists_timestamp', now, debug=debug)
663    return _exists

See if a Pipe's table exists.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool corresponding to whether a pipe's underlying table exists.
def filter_existing( self, df: pandas.DataFrame, safe_copy: bool = True, date_bound_only: bool = False, include_unchanged_columns: bool = False, enforce_dtypes: bool = False, chunksize: Optional[int] = -1, debug: bool = False, **kw) -> Tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]:
 666def filter_existing(
 667    self,
 668    df: 'pd.DataFrame',
 669    safe_copy: bool = True,
 670    date_bound_only: bool = False,
 671    include_unchanged_columns: bool = False,
 672    enforce_dtypes: bool = False,
 673    chunksize: Optional[int] = -1,
 674    debug: bool = False,
 675    **kw
 676) -> Tuple['pd.DataFrame', 'pd.DataFrame', 'pd.DataFrame']:
 677    """
 678    Inspect a dataframe and filter out rows which already exist in the pipe.
 679
 680    Parameters
 681    ----------
 682    df: 'pd.DataFrame'
 683        The dataframe to inspect and filter.
 684
 685    safe_copy: bool, default True
 686        If `True`, create a copy before comparing and modifying the dataframes.
 687        Setting to `False` may mutate the DataFrames.
 688        See `meerschaum.utils.dataframe.filter_unseen_df`.
 689
 690    date_bound_only: bool, default False
 691        If `True`, only use the datetime index to fetch the sample dataframe.
 692
 693    include_unchanged_columns: bool, default False
 694        If `True`, include the backtrack columns which haven't changed in the update dataframe.
 695        This is useful if you can't update individual keys.
 696
 697    enforce_dtypes: bool, default False
 698        If `True`, ensure the given and intermediate dataframes are enforced to the correct dtypes.
 699        Setting `enforce_dtypes=True` may impact performance.
 700
 701    chunksize: Optional[int], default -1
 702        The `chunksize` used when fetching existing data.
 703
 704    debug: bool, default False
 705        Verbosity toggle.
 706
 707    Returns
 708    -------
 709    A tuple of three pandas DataFrames: unseen, update, and delta.
 710    """
 711    from meerschaum.utils.warnings import warn
 712    from meerschaum.utils.debug import dprint
 713    from meerschaum.utils.packages import attempt_import, import_pandas
 714    from meerschaum.utils.dataframe import (
 715        filter_unseen_df,
 716        add_missing_cols_to_df,
 717        get_unhashable_cols,
 718    )
 719    from meerschaum.utils.dtypes import (
 720        to_pandas_dtype,
 721        none_if_null,
 722        to_datetime,
 723        are_dtypes_equal,
 724        value_is_null,
 725        round_time,
 726    )
 727    from meerschaum.config import get_config
 728    pd = import_pandas()
 729    pandas = attempt_import('pandas')
 730    if enforce_dtypes or 'dataframe' not in str(type(df)).lower():
 731        df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
 732    is_dask = hasattr('df', '__module__') and 'dask' in df.__module__
 733    if is_dask:
 734        dd = attempt_import('dask.dataframe')
 735        merge = dd.merge
 736        NA = pandas.NA
 737    else:
 738        merge = pd.merge
 739        NA = pd.NA
 740
 741    parameters = self.parameters
 742    pipe_columns = self.columns
 743    primary_key = pipe_columns.get('primary', None)
 744    dt_col = pipe_columns.get('datetime', None)
 745    dt_type = parameters.get('dtypes', {}).get(dt_col, 'datetime') if dt_col else None
 746    autoincrement = parameters.get('autoincrement', False)
 747    autotime = parameters.get('autotime', False)
 748
 749    if primary_key and autoincrement and df is not None and primary_key in df.columns:
 750        if safe_copy:
 751            df = df.copy()
 752            safe_copy = False
 753        if df[primary_key].isnull().all():
 754            del df[primary_key]
 755            _ = self.columns.pop(primary_key, None)
 756
 757    if dt_col and autotime and df is not None and dt_col in df.columns:
 758        if safe_copy:
 759            df = df.copy()
 760            safe_copy = False
 761        if df[dt_col].isnull().all():
 762            del df[dt_col]
 763            _ = self.columns.pop(dt_col, None)
 764
 765    def get_empty_df():
 766        empty_df = pd.DataFrame([])
 767        dtypes = dict(df.dtypes) if df is not None else {}
 768        dtypes.update(self.dtypes) if self.enforce else {}
 769        pd_dtypes = {
 770            col: to_pandas_dtype(str(typ))
 771            for col, typ in dtypes.items()
 772        }
 773        return add_missing_cols_to_df(empty_df, pd_dtypes)
 774
 775    if df is None:
 776        empty_df = get_empty_df()
 777        return empty_df, empty_df, empty_df
 778
 779    if (df.empty if not is_dask else len(df) == 0):
 780        return df, df, df
 781
 782    ### begin is the oldest data in the new dataframe
 783    begin, end = None, None
 784
 785    if autoincrement and primary_key == dt_col and dt_col not in df.columns:
 786        if enforce_dtypes:
 787            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
 788        return df, get_empty_df(), df
 789
 790    if autotime and dt_col and dt_col not in df.columns:
 791        if enforce_dtypes:
 792            df = self.enforce_dtypes(df, chunksize=chunksize, debug=debug)
 793        return df, get_empty_df(), df
 794
 795    try:
 796        min_dt_val = df[dt_col].min(skipna=True) if dt_col and dt_col in df.columns else None
 797        if is_dask and min_dt_val is not None:
 798            min_dt_val = min_dt_val.compute()
 799        min_dt = (
 800            to_datetime(min_dt_val, as_pydatetime=True)
 801            if min_dt_val is not None and are_dtypes_equal(dt_type, 'datetime')
 802            else min_dt_val
 803        )
 804    except Exception:
 805        min_dt = None
 806
 807    if not are_dtypes_equal('datetime', str(type(min_dt))) or value_is_null(min_dt):
 808       if not are_dtypes_equal('int', str(type(min_dt))):
 809            min_dt = None
 810
 811    if isinstance(min_dt, datetime):
 812        rounded_min_dt = round_time(min_dt, to='down')
 813        try:
 814            begin = rounded_min_dt - timedelta(minutes=1)
 815        except OverflowError:
 816            begin = rounded_min_dt
 817    elif dt_type and 'int' in dt_type.lower():
 818        begin = min_dt
 819    elif dt_col is None:
 820        begin = None
 821
 822    ### end is the newest data in the new dataframe
 823    try:
 824        max_dt_val = df[dt_col].max(skipna=True) if dt_col and dt_col in df.columns else None
 825        if is_dask and max_dt_val is not None:
 826            max_dt_val = max_dt_val.compute()
 827        max_dt = (
 828            to_datetime(max_dt_val, as_pydatetime=True)
 829            if max_dt_val is not None and 'datetime' in str(dt_type)
 830            else max_dt_val
 831        )
 832    except Exception:
 833        import traceback
 834        traceback.print_exc()
 835        max_dt = None
 836
 837    if not are_dtypes_equal('datetime', str(type(max_dt))) or value_is_null(max_dt):
 838        if not are_dtypes_equal('int', str(type(max_dt))):
 839            max_dt = None
 840
 841    if isinstance(max_dt, datetime):
 842        end = (
 843            round_time(
 844                max_dt,
 845                to='down'
 846            ) + timedelta(minutes=1)
 847        )
 848    elif dt_type and 'int' in dt_type.lower() and max_dt is not None:
 849        end = max_dt + 1
 850
 851    if max_dt is not None and min_dt is not None and min_dt > max_dt:
 852        warn("Detected minimum datetime greater than maximum datetime.")
 853
 854    if begin is not None and end is not None and begin > end:
 855        if isinstance(begin, datetime):
 856            begin = end - timedelta(minutes=1)
 857        ### We might be using integers for the datetime axis.
 858        else:
 859            begin = end - 1
 860
 861    unique_index_vals = {
 862        col: df[col].unique()
 863        for col in (pipe_columns.values() if not primary_key else [primary_key])
 864        if col in df.columns and col != dt_col
 865    } if not date_bound_only else {}
 866    unique_index_lens = {
 867        col: len(unique_vals)
 868        for col, unique_vals in unique_index_vals.items()
 869    } if not date_bound_only else {}
 870    filter_params_index_limit = get_config('pipes', 'sync', 'filter_params_index_limit')
 871    _ = kw.pop('params', None)
 872    params = {
 873        col: [
 874            none_if_null(val)
 875            for val in unique_vals
 876        ]
 877        for col, unique_vals in unique_index_vals.items()
 878        if unique_index_lens[col] <= filter_params_index_limit
 879    } if not date_bound_only else {}
 880
 881    if debug:
 882        dprint(
 883            (
 884                f"Looking at data between '{begin}' and '{end}' with index value lengths:\n"
 885                f"{json.dumps(unique_index_lens, indent=4)}\n"
 886            ),
 887            **kw
 888        )
 889
 890    backtrack_df = self.get_data(
 891        begin=begin,
 892        end=end,
 893        chunksize=chunksize,
 894        params=params,
 895        debug=debug,
 896        **kw
 897    )
 898    if backtrack_df is None:
 899        if debug:
 900            dprint(f"No backtrack data was found for {self}.")
 901        return df, get_empty_df(), df
 902
 903    if enforce_dtypes:
 904        backtrack_df = self.enforce_dtypes(backtrack_df, chunksize=chunksize, debug=debug)
 905
 906    if debug:
 907        dprint(f"Existing data for {self}:\n" + str(backtrack_df), **kw)
 908        dprint(f"Existing dtypes for {self}:\n" + str(backtrack_df.dtypes))
 909
 910    ### Separate new rows from changed ones.
 911    on_cols = [
 912        col
 913        for col_key, col in pipe_columns.items()
 914        if (
 915            col
 916            and
 917            col_key != 'value'
 918            and col in backtrack_df.columns
 919        )
 920    ] if not primary_key else [primary_key]
 921
 922    self_dtypes = self.get_dtypes(debug=debug) if self.enforce else {}
 923    on_cols_dtypes = {
 924        col: to_pandas_dtype(typ)
 925        for col, typ in self_dtypes.items()
 926        if col in on_cols
 927    }
 928
 929    ### Detect changes between the old target and new source dataframes.
 930    delta_df = add_missing_cols_to_df(
 931        filter_unseen_df(
 932            backtrack_df,
 933            df,
 934            dtypes={
 935                col: to_pandas_dtype(typ)
 936                for col, typ in self_dtypes.items()
 937            },
 938            safe_copy=safe_copy,
 939            coerce_mixed_numerics=(not self.static),
 940            debug=debug
 941        ),
 942        on_cols_dtypes,
 943    )
 944    if enforce_dtypes:
 945        delta_df = self.enforce_dtypes(delta_df, chunksize=chunksize, debug=debug)
 946
 947    ### Cast dicts or lists to strings so we can merge.
 948    serializer = functools.partial(json.dumps, sort_keys=True, separators=(',', ':'), default=str)
 949
 950    def deserializer(x):
 951        return json.loads(x) if isinstance(x, str) else x
 952
 953    unhashable_delta_cols = get_unhashable_cols(delta_df)
 954    unhashable_backtrack_cols = get_unhashable_cols(backtrack_df)
 955    for col in unhashable_delta_cols:
 956        delta_df[col] = delta_df[col].apply(serializer)
 957    for col in unhashable_backtrack_cols:
 958        backtrack_df[col] = backtrack_df[col].apply(serializer)
 959    casted_cols = set(unhashable_delta_cols + unhashable_backtrack_cols)
 960
 961    joined_df = merge(
 962        delta_df.infer_objects().fillna(NA),
 963        backtrack_df.infer_objects().fillna(NA),
 964        how='left',
 965        on=on_cols,
 966        indicator=True,
 967        suffixes=('', '_old'),
 968    ) if on_cols else delta_df
 969    for col in casted_cols:
 970        if col in joined_df.columns:
 971            joined_df[col] = joined_df[col].apply(deserializer)
 972        if col in delta_df.columns:
 973            delta_df[col] = delta_df[col].apply(deserializer)
 974
 975    ### Determine which rows are completely new.
 976    new_rows_mask = (joined_df['_merge'] == 'left_only') if on_cols else None
 977    cols = list(delta_df.columns)
 978
 979    unseen_df = (
 980        joined_df
 981        .where(new_rows_mask)
 982        .dropna(how='all')[cols]
 983        .reset_index(drop=True)
 984    ) if on_cols else delta_df
 985
 986    ### Rows that have already been inserted but values have changed.
 987    update_df = (
 988        joined_df
 989        .where(~new_rows_mask)
 990        .dropna(how='all')[cols]
 991        .reset_index(drop=True)
 992    ) if on_cols else get_empty_df()
 993
 994    if include_unchanged_columns and on_cols:
 995        unchanged_backtrack_cols = [
 996            col
 997            for col in backtrack_df.columns
 998            if col in on_cols or col not in update_df.columns
 999        ]
1000        if enforce_dtypes:
1001            update_df = self.enforce_dtypes(update_df, chunksize=chunksize, debug=debug)
1002        update_df = merge(
1003            backtrack_df[unchanged_backtrack_cols],
1004            update_df,
1005            how='inner',
1006            on=on_cols,
1007        )
1008
1009    return unseen_df, update_df, delta_df

Inspect a dataframe and filter out rows which already exist in the pipe.

Parameters
  • df ('pd.DataFrame'): The dataframe to inspect and filter.
  • safe_copy (bool, default True): If True, create a copy before comparing and modifying the dataframes. Setting to False may mutate the DataFrames. See meerschaum.utils.dataframe.filter_unseen_df.
  • date_bound_only (bool, default False): If True, only use the datetime index to fetch the sample dataframe.
  • include_unchanged_columns (bool, default False): If True, include the backtrack columns which haven't changed in the update dataframe. This is useful if you can't update individual keys.
  • enforce_dtypes (bool, default False): If True, ensure the given and intermediate dataframes are enforced to the correct dtypes. Setting enforce_dtypes=True may impact performance.
  • chunksize (Optional[int], default -1): The chunksize used when fetching existing data.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A tuple of three pandas DataFrames (unseen, update, and delta.):
def get_num_workers(self, workers: Optional[int] = None) -> int:
1034def get_num_workers(self, workers: Optional[int] = None) -> int:
1035    """
1036    Get the number of workers to use for concurrent syncs.
1037
1038    Parameters
1039    ----------
1040    The number of workers passed via `--workers`.
1041
1042    Returns
1043    -------
1044    The number of workers, capped for safety.
1045    """
1046    is_thread_safe = getattr(self.instance_connector, 'IS_THREAD_SAFE', False)
1047    if not is_thread_safe:
1048        return 1
1049
1050    engine_pool_size = (
1051        self.instance_connector.engine.pool.size()
1052        if self.instance_connector.type == 'sql'
1053        else None
1054    )
1055    current_num_threads = threading.active_count()
1056    current_num_connections = (
1057        self.instance_connector.engine.pool.checkedout()
1058        if engine_pool_size is not None
1059        else current_num_threads
1060    )
1061    desired_workers = (
1062        min(workers or engine_pool_size, engine_pool_size)
1063        if engine_pool_size is not None
1064        else workers
1065    )
1066    if desired_workers is None:
1067        desired_workers = (multiprocessing.cpu_count() if is_thread_safe else 1)
1068
1069    return max(
1070        (desired_workers - current_num_connections),
1071        1,
1072    )

Get the number of workers to use for concurrent syncs.

Parameters
  • The number of workers passed via --workers.
Returns
  • The number of workers, capped for safety.
def verify( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, bounded: Optional[bool] = None, deduplicate: bool = False, workers: Optional[int] = None, batchsize: Optional[int] = None, skip_chunks_with_greater_rowcounts: bool = False, check_rowcounts_only: bool = False, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 19def verify(
 20    self,
 21    begin: Union[datetime, int, None] = None,
 22    end: Union[datetime, int, None] = None,
 23    params: Optional[Dict[str, Any]] = None,
 24    chunk_interval: Union[timedelta, int, None] = None,
 25    bounded: Optional[bool] = None,
 26    deduplicate: bool = False,
 27    workers: Optional[int] = None,
 28    batchsize: Optional[int] = None,
 29    skip_chunks_with_greater_rowcounts: bool = False,
 30    check_rowcounts_only: bool = False,
 31    debug: bool = False,
 32    **kwargs: Any
 33) -> SuccessTuple:
 34    """
 35    Verify the contents of the pipe by resyncing its interval.
 36
 37    Parameters
 38    ----------
 39    begin: Union[datetime, int, None], default None
 40        If specified, only verify rows greater than or equal to this value.
 41
 42    end: Union[datetime, int, None], default None
 43        If specified, only verify rows less than this value.
 44
 45    chunk_interval: Union[timedelta, int, None], default None
 46        If provided, use this as the size of the chunk boundaries.
 47        Default to the value set in `pipe.parameters['verify']['chunk_minutes']` (43200 — 30 days).
 48
 49    bounded: Optional[bool], default None
 50        If `True`, do not verify older than the oldest sync time or newer than the newest.
 51        If `False`, verify unbounded syncs outside of the new and old sync times.
 52        The default behavior (`None`) is to bound only if a bound interval is set
 53        (e.g. `pipe.parameters['verify']['bound_days']`).
 54
 55    deduplicate: bool, default False
 56        If `True`, deduplicate the pipe's table after the verification syncs.
 57
 58    workers: Optional[int], default None
 59        If provided, limit the verification to this many threads.
 60        Use a value of `1` to sync chunks in series.
 61
 62    batchsize: Optional[int], default None
 63        If provided, sync this many chunks in parallel.
 64        Defaults to `Pipe.get_num_workers()`.
 65
 66    skip_chunks_with_greater_rowcounts: bool, default False
 67        If `True`, compare the rowcounts for a chunk and skip syncing if the pipe's
 68        chunk rowcount equals or exceeds the remote's rowcount.
 69
 70    check_rowcounts_only: bool, default False
 71        If `True`, only compare rowcounts and print chunks which are out-of-sync.
 72
 73    debug: bool, default False
 74        Verbosity toggle.
 75
 76    kwargs: Any
 77        All keyword arguments are passed to `pipe.sync()`.
 78
 79    Returns
 80    -------
 81    A SuccessTuple indicating whether the pipe was successfully resynced.
 82    """
 83    from meerschaum.utils.pool import get_pool
 84    from meerschaum.utils.formatting import make_header
 85    from meerschaum.utils.misc import interval_str
 86    workers = self.get_num_workers(workers)
 87    check_rowcounts = skip_chunks_with_greater_rowcounts or check_rowcounts_only
 88
 89    ### Skip configured bounding in parameters
 90    ### if `bounded` is explicitly `False`.
 91    bound_time = (
 92        self.get_bound_time(debug=debug)
 93        if bounded is not False
 94        else None
 95    )
 96    if bounded is None:
 97        bounded = bound_time is not None
 98
 99    if bounded and begin is None:
100        begin = (
101            bound_time
102            if bound_time is not None
103            else self.get_sync_time(newest=False, debug=debug)
104        )
105        if begin is None:
106            remote_oldest_sync_time = self.get_sync_time(newest=False, remote=True, debug=debug)
107            begin = remote_oldest_sync_time
108    if bounded and end is None:
109        end = self.get_sync_time(newest=True, debug=debug)
110        if end is None:
111            remote_newest_sync_time = self.get_sync_time(newest=True, remote=True, debug=debug)
112            end = remote_newest_sync_time
113        if end is not None:
114            end += (
115                timedelta(minutes=1)
116                if hasattr(end, 'tzinfo')
117                else 1
118            )
119
120    begin, end = self.parse_date_bounds(begin, end)
121    cannot_determine_bounds = bounded and begin is None and end is None
122
123    if cannot_determine_bounds and not check_rowcounts_only:
124        warn(f"Cannot determine sync bounds for {self}. Syncing instead...", stack=False)
125        sync_success, sync_msg = self.sync(
126            begin=begin,
127            end=end,
128            params=params,
129            workers=workers,
130            debug=debug,
131            **kwargs
132        )
133        if not sync_success:
134            return sync_success, sync_msg
135
136        if deduplicate:
137            return self.deduplicate(
138                begin=begin,
139                end=end,
140                params=params,
141                workers=workers,
142                debug=debug,
143                **kwargs
144            )
145        return sync_success, sync_msg
146
147    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
148    chunk_bounds = self.get_chunk_bounds(
149        begin=begin,
150        end=end,
151        chunk_interval=chunk_interval,
152        bounded=bounded,
153        align=True,
154        debug=debug,
155    )
156
157    ### Consider it a success if no chunks need to be verified.
158    if not chunk_bounds:
159        if deduplicate:
160            return self.deduplicate(
161                begin=begin,
162                end=end,
163                params=params,
164                workers=workers,
165                debug=debug,
166                **kwargs
167            )
168        return True, f"Could not determine chunks between '{begin}' and '{end}'; nothing to do."
169
170    begin_to_print = (
171        begin
172        if begin is not None
173        else (
174            chunk_bounds[0][0]
175            if bounded
176            else chunk_bounds[0][1]
177        )
178    )
179    end_to_print = (
180        end
181        if end is not None
182        else (
183            chunk_bounds[-1][1]
184            if bounded
185            else chunk_bounds[-1][0]
186        )
187    )
188    message_header = f"{begin_to_print} - {end_to_print}"
189    max_chunks_syncs = mrsm.get_config('pipes', 'verify', 'max_chunks_syncs')
190
191    info(
192        f"Verifying {self}:\n    "
193        + ("Syncing" if not check_rowcounts_only else "Checking")
194        + f" {len(chunk_bounds)} chunk"
195        + ('s' if len(chunk_bounds) != 1 else '')
196        + f" ({'un' if not bounded else ''}bounded)"
197        + f" of size '{interval_str(chunk_interval)}'"
198        + f" between '{begin_to_print}' and '{end_to_print}'.\n"
199    )
200
201    ### Dictionary of the form bounds -> success_tuple, e.g.:
202    ### {
203    ###    (2023-01-01, 2023-01-02): (True, "Success")
204    ### }
205    bounds_success_tuples = {}
206    def process_chunk_bounds(
207        chunk_begin_and_end: Tuple[
208            Union[int, datetime],
209            Union[int, datetime]
210        ],
211        _workers: Optional[int] = 1,
212    ):
213        if chunk_begin_and_end in bounds_success_tuples:
214            return chunk_begin_and_end, bounds_success_tuples[chunk_begin_and_end]
215
216        chunk_begin, chunk_end = chunk_begin_and_end
217        do_sync = True
218        chunk_success, chunk_msg = False, "Did not sync chunk."
219        if check_rowcounts:
220            existing_rowcount = self.get_rowcount(begin=chunk_begin, end=chunk_end, debug=debug)
221            remote_rowcount = self.get_rowcount(
222                begin=chunk_begin,
223                end=chunk_end,
224                remote=True,
225                debug=debug,
226            )
227            checked_rows_str = (
228                f"checked {existing_rowcount:,} row"
229                + ("s" if existing_rowcount != 1 else '')
230                + f" vs {remote_rowcount:,} remote"
231            )
232            if (
233                existing_rowcount is not None
234                and remote_rowcount is not None
235                and existing_rowcount >= remote_rowcount
236            ):
237                do_sync = False
238                chunk_success, chunk_msg = True, (
239                    "Row-count is up-to-date "
240                    f"({checked_rows_str})."
241                )
242            elif check_rowcounts_only:
243                do_sync = False
244                chunk_success, chunk_msg = True, (
245                    f"Row-counts are out-of-sync ({checked_rows_str})."
246                )
247
248        num_syncs = 0
249        while num_syncs < max_chunks_syncs:
250            chunk_success, chunk_msg = self.sync(
251                begin=chunk_begin,
252                end=chunk_end,
253                params=params,
254                workers=_workers,
255                debug=debug,
256                **kwargs
257            ) if do_sync else (chunk_success, chunk_msg)
258            if chunk_success:
259                break
260            num_syncs += 1
261            time.sleep(num_syncs**2)
262        chunk_msg = chunk_msg.strip()
263        if ' - ' not in chunk_msg:
264            chunk_label = f"{chunk_begin} - {chunk_end}"
265            chunk_msg = f'Verified chunk for {self}:\n{chunk_label}\n{chunk_msg}'
266        mrsm.pprint((chunk_success, chunk_msg))
267
268        return chunk_begin_and_end, (chunk_success, chunk_msg)
269
270    ### If we have more than one chunk, attempt to sync the first one and return if its fails.
271    if len(chunk_bounds) > 1:
272        first_chunk_bounds = chunk_bounds[0]
273        first_label = f"{first_chunk_bounds[0]} - {first_chunk_bounds[1]}"
274        info(f"Verifying first chunk for {self}:\n    {first_label}")
275        (
276            (first_begin, first_end),
277            (first_success, first_msg)
278        ) = process_chunk_bounds(first_chunk_bounds, _workers=workers)
279        if not first_success:
280            return (
281                first_success,
282                f"\n{first_label}\n"
283                + f"Failed to sync first chunk:\n{first_msg}"
284            )
285        bounds_success_tuples[first_chunk_bounds] = (first_success, first_msg)
286        info(f"Completed first chunk for {self}:\n    {first_label}\n")
287        chunk_bounds = chunk_bounds[1:]
288
289    pool = get_pool(workers=workers)
290    batches = self.get_chunk_bounds_batches(chunk_bounds, batchsize=batchsize, workers=workers)
291
292    def process_batch(
293        batch_chunk_bounds: Tuple[
294            Tuple[Union[datetime, int, None], Union[datetime, int, None]],
295            ...
296        ]
297    ):
298        _batch_begin = batch_chunk_bounds[0][0]
299        _batch_end = batch_chunk_bounds[-1][-1]
300        batch_message_header = f"{_batch_begin} - {_batch_end}"
301
302        if check_rowcounts_only:
303            info(f"Checking row-counts for batch bounds:\n    {batch_message_header}")
304            _, (batch_init_success, batch_init_msg) = process_chunk_bounds(
305                (_batch_begin, _batch_end)
306            )
307            mrsm.pprint((batch_init_success, batch_init_msg))
308            if batch_init_success and 'up-to-date' in batch_init_msg:
309                info("Entire batch is up-to-date.")
310                return batch_init_success, batch_init_msg
311
312        batch_bounds_success_tuples = dict(pool.map(process_chunk_bounds, batch_chunk_bounds))
313        bounds_success_tuples.update(batch_bounds_success_tuples)
314        batch_bounds_success_bools = {
315            bounds: tup[0]
316            for bounds, tup in batch_bounds_success_tuples.items()
317        }
318
319        if all(batch_bounds_success_bools.values()):
320            msg = get_chunks_success_message(
321                batch_bounds_success_tuples,
322                header=batch_message_header,
323                check_rowcounts_only=check_rowcounts_only,
324            )
325            if deduplicate:
326                deduplicate_success, deduplicate_msg = self.deduplicate(
327                    begin=_batch_begin,
328                    end=_batch_end,
329                    params=params,
330                    workers=workers,
331                    debug=debug,
332                    **kwargs
333                )
334                return deduplicate_success, msg + '\n\n' + deduplicate_msg
335            return True, msg
336
337        batch_chunk_bounds_to_resync = [
338            bounds
339            for bounds, success in zip(batch_chunk_bounds, batch_bounds_success_bools)
340            if not success
341        ]
342        batch_bounds_to_print = [
343            f"{bounds[0]} - {bounds[1]}"
344            for bounds in batch_chunk_bounds_to_resync
345        ]
346        if batch_bounds_to_print:
347            warn(
348                "Will resync the following failed chunks:\n    "
349                + '\n    '.join(batch_bounds_to_print),
350                stack=False,
351            )
352
353        retry_bounds_success_tuples = dict(pool.map(
354            process_chunk_bounds,
355            batch_chunk_bounds_to_resync
356        ))
357        batch_bounds_success_tuples.update(retry_bounds_success_tuples)
358        bounds_success_tuples.update(retry_bounds_success_tuples)
359        retry_bounds_success_bools = {
360            bounds: tup[0]
361            for bounds, tup in retry_bounds_success_tuples.items()
362        }
363
364        if all(retry_bounds_success_bools.values()):
365            chunks_message = (
366                get_chunks_success_message(
367                    batch_bounds_success_tuples,
368                    header=batch_message_header,
369                    check_rowcounts_only=check_rowcounts_only,
370                ) + f"\nRetried {len(batch_chunk_bounds_to_resync)} chunk" + (
371                    's'
372                    if len(batch_chunk_bounds_to_resync) != 1
373                    else ''
374                ) + "."
375            )
376            if deduplicate:
377                deduplicate_success, deduplicate_msg = self.deduplicate(
378                    begin=_batch_begin,
379                    end=_batch_end,
380                    params=params,
381                    workers=workers,
382                    debug=debug,
383                    **kwargs
384                )
385                return deduplicate_success, chunks_message + '\n\n' + deduplicate_msg
386            return True, chunks_message
387
388        batch_chunks_message = get_chunks_success_message(
389            batch_bounds_success_tuples,
390            header=batch_message_header,
391            check_rowcounts_only=check_rowcounts_only,
392        )
393        if deduplicate:
394            deduplicate_success, deduplicate_msg = self.deduplicate(
395                begin=begin,
396                end=end,
397                params=params,
398                workers=workers,
399                debug=debug,
400                **kwargs
401            )
402            return deduplicate_success, batch_chunks_message + '\n\n' + deduplicate_msg
403        return False, batch_chunks_message
404
405    num_batches = len(batches)
406    for batch_i, batch in enumerate(batches):
407        batch_begin = batch[0][0]
408        batch_end = batch[-1][-1]
409        batch_counter_str = f"({(batch_i + 1):,}/{num_batches:,})"
410        batch_label = f"batch {batch_counter_str}:\n{batch_begin} - {batch_end}"
411        retry_failed_batch = True
412        try:
413            for_self = 'for ' + str(self)
414            batch_label_str = batch_label.replace(':\n', ' ' + for_self + '...\n    ')
415            info(f"Verifying {batch_label_str}\n")
416            batch_success, batch_msg = process_batch(batch)
417        except (KeyboardInterrupt, Exception) as e:
418            batch_success = False
419            batch_msg = str(e)
420            retry_failed_batch = False
421
422        batch_msg_to_print = (
423            f"{make_header('Completed batch ' + batch_counter_str + ':', left_pad=0)}\n{batch_msg}"
424        )
425        mrsm.pprint((batch_success, batch_msg_to_print))
426
427        if not batch_success and retry_failed_batch:
428            info(f"Retrying batch {batch_counter_str}...")
429            retry_batch_success, retry_batch_msg = process_batch(batch)
430            retry_batch_msg_to_print = (
431                f"Retried {make_header('batch ' + batch_label, left_pad=0)}\n{retry_batch_msg}"
432            )
433            mrsm.pprint((retry_batch_success, retry_batch_msg_to_print))
434
435            batch_success = retry_batch_success
436            batch_msg = retry_batch_msg
437
438        if not batch_success:
439            return False, f"Failed to verify {batch_label}:\n\n{batch_msg}"
440
441    chunks_message = get_chunks_success_message(
442        bounds_success_tuples,
443        header=message_header,
444        check_rowcounts_only=check_rowcounts_only,
445    )
446    return True, chunks_message

Verify the contents of the pipe by resyncing its interval.

Parameters
  • begin (Union[datetime, int, None], default None): If specified, only verify rows greater than or equal to this value.
  • end (Union[datetime, int, None], default None): If specified, only verify rows less than this value.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this as the size of the chunk boundaries. Default to the value set in pipe.parameters['verify']['chunk_minutes'] (43200 — 30 days).
  • bounded (Optional[bool], default None): If True, do not verify older than the oldest sync time or newer than the newest. If False, verify unbounded syncs outside of the new and old sync times. The default behavior (None) is to bound only if a bound interval is set (e.g. pipe.parameters['verify']['bound_days']).
  • deduplicate (bool, default False): If True, deduplicate the pipe's table after the verification syncs.
  • workers (Optional[int], default None): If provided, limit the verification to this many threads. Use a value of 1 to sync chunks in series.
  • batchsize (Optional[int], default None): If provided, sync this many chunks in parallel. Defaults to Pipe.get_num_workers().
  • skip_chunks_with_greater_rowcounts (bool, default False): If True, compare the rowcounts for a chunk and skip syncing if the pipe's chunk rowcount equals or exceeds the remote's rowcount.
  • check_rowcounts_only (bool, default False): If True, only compare rowcounts and print chunks which are out-of-sync.
  • debug (bool, default False): Verbosity toggle.
  • kwargs (Any): All keyword arguments are passed to pipe.sync().
Returns
  • A SuccessTuple indicating whether the pipe was successfully resynced.
def get_bound_interval(self, debug: bool = False) -> Union[datetime.timedelta, int, NoneType]:
547def get_bound_interval(self, debug: bool = False) -> Union[timedelta, int, None]:
548    """
549    Return the interval used to determine the bound time (limit for verification syncs).
550    If the datetime axis is an integer, just return its value.
551
552    Below are the supported keys for the bound interval:
553
554        - `pipe.parameters['verify']['bound_minutes']`
555        - `pipe.parameters['verify']['bound_hours']`
556        - `pipe.parameters['verify']['bound_days']`
557        - `pipe.parameters['verify']['bound_weeks']`
558        - `pipe.parameters['verify']['bound_years']`
559        - `pipe.parameters['verify']['bound_seconds']`
560
561    If multiple keys are present, the first on this priority list will be used.
562
563    Returns
564    -------
565    A `timedelta` or `int` value to be used to determine the bound time.
566    """
567    verify_params = self.parameters.get('verify', {})
568    prefix = 'bound_'
569    suffixes_to_check = ('minutes', 'hours', 'days', 'weeks', 'years', 'seconds')
570    keys_to_search = {
571        key: val
572        for key, val in verify_params.items()
573        if key.startswith(prefix)
574    }
575    bound_time_key, bound_time_value = None, None
576    for key, value in keys_to_search.items():
577        for suffix in suffixes_to_check:
578            if key == prefix + suffix:
579                bound_time_key = key
580                bound_time_value = value
581                break
582        if bound_time_key is not None:
583            break
584
585    if bound_time_value is None:
586        return bound_time_value
587
588    dt_col = self.columns.get('datetime', None)
589    if not dt_col:
590        return bound_time_value
591
592    dt_typ = self.dtypes.get(dt_col, 'datetime')
593    if 'int' in dt_typ.lower():
594        return int(bound_time_value)
595
596    interval_type = bound_time_key.replace(prefix, '')
597    return timedelta(**{interval_type: bound_time_value})

Return the interval used to determine the bound time (limit for verification syncs). If the datetime axis is an integer, just return its value.

Below are the supported keys for the bound interval:

- `pipe.parameters['verify']['bound_minutes']`
- `pipe.parameters['verify']['bound_hours']`
- `pipe.parameters['verify']['bound_days']`
- `pipe.parameters['verify']['bound_weeks']`
- `pipe.parameters['verify']['bound_years']`
- `pipe.parameters['verify']['bound_seconds']`

If multiple keys are present, the first on this priority list will be used.

Returns
  • A timedelta or int value to be used to determine the bound time.
def get_bound_time(self, debug: bool = False) -> Union[datetime.datetime, int, NoneType]:
600def get_bound_time(self, debug: bool = False) -> Union[datetime, int, None]:
601    """
602    The bound time is the limit at which long-running verification syncs should stop.
603    A value of `None` means verification syncs should be unbounded.
604
605    Like deriving a backtrack time from `pipe.get_sync_time()`,
606    the bound time is the sync time minus a large window (e.g. 366 days).
607
608    Unbound verification syncs (i.e. `bound_time is None`)
609    if the oldest sync time is less than the bound interval.
610
611    Returns
612    -------
613    A `datetime` or `int` corresponding to the
614    `begin` bound for verification and deduplication syncs.
615    """
616    bound_interval = self.get_bound_interval(debug=debug)
617    if bound_interval is None:
618        return None
619
620    sync_time = self.get_sync_time(debug=debug)
621    if sync_time is None:
622        return None
623
624    bound_time = sync_time - bound_interval
625    oldest_sync_time = self.get_sync_time(newest=False, debug=debug)
626    max_bound_time_days = STATIC_CONFIG['pipes']['max_bound_time_days']
627
628    extreme_sync_times_delta = (
629        hasattr(oldest_sync_time, 'tzinfo')
630        and (sync_time - oldest_sync_time) >= timedelta(days=max_bound_time_days)
631    )
632
633    return (
634        bound_time
635        if bound_time > oldest_sync_time or extreme_sync_times_delta
636        else None
637    )

The bound time is the limit at which long-running verification syncs should stop. A value of None means verification syncs should be unbounded.

Like deriving a backtrack time from pipe.get_sync_time(), the bound time is the sync time minus a large window (e.g. 366 days).

Unbound verification syncs (i.e. bound_time is None) if the oldest sync time is less than the bound interval.

Returns
  • A datetime or int corresponding to the
  • begin bound for verification and deduplication syncs.
def delete(self, drop: bool = True, debug: bool = False, **kw) -> Tuple[bool, str]:
12def delete(
13    self,
14    drop: bool = True,
15    debug: bool = False,
16    **kw
17) -> SuccessTuple:
18    """
19    Call the Pipe's instance connector's `delete_pipe()` method.
20
21    Parameters
22    ----------
23    drop: bool, default True
24        If `True`, drop the pipes' target table.
25
26    debug : bool, default False
27        Verbosity toggle.
28
29    Returns
30    -------
31    A `SuccessTuple` of success (`bool`), message (`str`).
32
33    """
34    from meerschaum.utils.warnings import warn
35    from meerschaum.utils.venv import Venv
36    from meerschaum.connectors import get_connector_plugin
37
38    if self.temporary:
39        if self.cache:
40            invalidate_success, invalidate_msg = self._invalidate_cache(hard=True, debug=debug)
41            if not invalidate_success:
42                return invalidate_success, invalidate_msg
43
44        return (
45            False,
46            "Cannot delete pipes created with `temporary=True` (read-only). "
47            + "You may want to call `pipe.drop()` instead."
48        )
49
50    if drop:
51        drop_success, drop_msg = self.drop(debug=debug)
52        if not drop_success:
53            warn(f"Failed to drop {self}:\n{drop_msg}")
54
55    with Venv(get_connector_plugin(self.instance_connector)):
56        result = self.instance_connector.delete_pipe(self, debug=debug, **kw)
57
58    if not isinstance(result, tuple):
59        return False, f"Received an unexpected result from '{self.instance_connector}': {result}"
60
61    if result[0]:
62        self._invalidate_cache(hard=True, debug=debug)
63        self._clear_cache_key('_id', debug=debug)
64
65    return result

Call the Pipe's instance connector's delete_pipe() method.

Parameters
  • drop (bool, default True): If True, drop the pipes' target table.
  • debug (bool, default False): Verbosity toggle.
Returns
def drop(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def drop(
15    self,
16    debug: bool = False,
17    **kw: Any
18) -> SuccessTuple:
19    """
20    Call the Pipe's instance connector's `drop_pipe()` method.
21
22    Parameters
23    ----------
24    debug: bool, default False:
25        Verbosity toggle.
26
27    Returns
28    -------
29    A `SuccessTuple` of success, message.
30
31    """
32    from meerschaum.utils.venv import Venv
33    from meerschaum.connectors import get_connector_plugin
34
35    self._clear_cache_key('_exists', debug=debug)
36
37    with Venv(get_connector_plugin(self.instance_connector)):
38        if hasattr(self.instance_connector, 'drop_pipe'):
39            result = self.instance_connector.drop_pipe(self, debug=debug, **kw)
40        else:
41            result = (
42                False,
43                (
44                    "Cannot drop pipes for instance connectors of type "
45                    f"'{self.instance_connector.type}'."
46                )
47            )
48
49    self._clear_cache_key('_exists', debug=debug)
50    self._clear_cache_key('_exists_timestamp', debug=debug)
51
52    return result

Call the Pipe's instance connector's drop_pipe() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def drop_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 55def drop_indices(
 56    self,
 57    columns: Optional[List[str]] = None,
 58    debug: bool = False,
 59    **kw: Any
 60) -> SuccessTuple:
 61    """
 62    Call the Pipe's instance connector's `drop_indices()` method.
 63
 64    Parameters
 65    ----------
 66    columns: Optional[List[str]] = None
 67        If provided, only drop indices in the given list.
 68
 69    debug: bool, default False:
 70        Verbosity toggle.
 71
 72    Returns
 73    -------
 74    A `SuccessTuple` of success, message.
 75
 76    """
 77    from meerschaum.utils.venv import Venv
 78    from meerschaum.connectors import get_connector_plugin
 79
 80    self._clear_cache_key('_columns_indices', debug=debug)
 81    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
 82    self._clear_cache_key('_columns_types', debug=debug)
 83    self._clear_cache_key('_columns_types_timestamp', debug=debug)
 84
 85    with Venv(get_connector_plugin(self.instance_connector)):
 86        if hasattr(self.instance_connector, 'drop_pipe_indices'):
 87            result = self.instance_connector.drop_pipe_indices(
 88                self,
 89                columns=columns,
 90                debug=debug,
 91                **kw
 92            )
 93        else:
 94            result = (
 95                False,
 96                (
 97                    "Cannot drop indices for instance connectors of type "
 98                    f"'{self.instance_connector.type}'."
 99                )
100            )
101
102    self._clear_cache_key('_columns_indices', debug=debug)
103    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
104    self._clear_cache_key('_columns_types', debug=debug)
105    self._clear_cache_key('_columns_types_timestamp', debug=debug)
106
107    return result

Call the Pipe's instance connector's drop_indices() method.

Parameters
  • columns (Optional[List[str]] = None): If provided, only drop indices in the given list.
  • debug (bool, default False:): Verbosity toggle.
Returns
def compress(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def compress(
15    self,
16    debug: bool = False,
17    **kw: Any
18) -> SuccessTuple:
19    """
20    Call the Pipe's instance connector's `compress_pipe()` method.
21
22    For TimescaleDB hypertables this enables and applies native compression.
23    Other flavors fall back to their respective compression mechanisms where supported.
24
25    Parameters
26    ----------
27    debug: bool, default False:
28        Verbosity toggle.
29
30    Returns
31    -------
32    A `SuccessTuple` of success, message.
33    """
34    from meerschaum.utils.venv import Venv
35    from meerschaum.connectors import get_connector_plugin
36
37    try:
38        with Venv(get_connector_plugin(self.instance_connector)):
39            if hasattr(self.instance_connector, 'compress_pipe'):
40                result = self.instance_connector.compress_pipe(self, debug=debug, **kw)
41            else:
42                result = (
43                    False,
44                    (
45                        "Cannot compress pipes for instance connectors of type "
46                        f"'{self.instance_connector.type}'."
47                    )
48                )
49    except NotImplementedError:
50        result = (
51            False,
52            (
53                "Compression is not implemented for instance connectors of type "
54                f"'{self.instance_connector.type}'."
55            )
56        )
57
58    self._clear_cache_key('_exists', debug=debug)
59    return result

Call the Pipe's instance connector's compress_pipe() method.

For TimescaleDB hypertables this enables and applies native compression. Other flavors fall back to their respective compression mechanisms where supported.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def decompress(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 62def decompress(
 63    self,
 64    debug: bool = False,
 65    **kw: Any
 66) -> SuccessTuple:
 67    """
 68    Call the Pipe's instance connector's `decompress_pipe()` method, the inverse of `compress()`.
 69
 70    For TimescaleDB hypertables this removes the compression policy and converts compressed
 71    chunks back to row-store. Other flavors fall back to their respective mechanisms where
 72    supported.
 73
 74    Parameters
 75    ----------
 76    debug: bool, default False:
 77        Verbosity toggle.
 78
 79    Returns
 80    -------
 81    A `SuccessTuple` of success, message.
 82    """
 83    from meerschaum.utils.venv import Venv
 84    from meerschaum.connectors import get_connector_plugin
 85
 86    try:
 87        with Venv(get_connector_plugin(self.instance_connector)):
 88            if hasattr(self.instance_connector, 'decompress_pipe'):
 89                result = self.instance_connector.decompress_pipe(self, debug=debug, **kw)
 90            else:
 91                result = (
 92                    False,
 93                    (
 94                        "Cannot decompress pipes for instance connectors of type "
 95                        f"'{self.instance_connector.type}'."
 96                    )
 97                )
 98    except NotImplementedError:
 99        result = (
100            False,
101            (
102                "Decompression is not implemented for instance connectors of type "
103                f"'{self.instance_connector.type}'."
104            )
105        )
106
107    self._clear_cache_key('_exists', debug=debug)
108    return result

Call the Pipe's instance connector's decompress_pipe() method, the inverse of compress().

For TimescaleDB hypertables this removes the compression policy and converts compressed chunks back to row-store. Other flavors fall back to their respective mechanisms where supported.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def vacuum( self, full: bool = False, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def vacuum(
15    self,
16    full: bool = False,
17    debug: bool = False,
18    **kw: Any
19) -> SuccessTuple:
20    """
21    Call the Pipe's instance connector's `vacuum_pipe()` method to reclaim disk space.
22
23    For PostgreSQL-family tables this runs `VACUUM` (optionally `VACUUM FULL`); other flavors
24    fall back to their respective space-reclaiming mechanisms where supported.
25
26    Parameters
27    ----------
28    full: bool, default False
29        If `True` (PostgreSQL family only), run `VACUUM FULL` to return freed space to the OS.
30
31    debug: bool, default False
32        Verbosity toggle.
33
34    Returns
35    -------
36    A `SuccessTuple` of success, message.
37    """
38    from meerschaum.utils.venv import Venv
39    from meerschaum.connectors import get_connector_plugin
40
41    try:
42        with Venv(get_connector_plugin(self.instance_connector)):
43            if hasattr(self.instance_connector, 'vacuum_pipe'):
44                result = self.instance_connector.vacuum_pipe(self, full=full, debug=debug, **kw)
45            else:
46                result = (
47                    False,
48                    (
49                        "Cannot vacuum pipes for instance connectors of type "
50                        f"'{self.instance_connector.type}'."
51                    )
52                )
53    except NotImplementedError:
54        result = (
55            False,
56            (
57                "Vacuuming is not implemented for instance connectors of type "
58                f"'{self.instance_connector.type}'."
59            )
60        )
61
62    self._clear_cache_key('_exists', debug=debug)
63    return result

Call the Pipe's instance connector's vacuum_pipe() method to reclaim disk space.

For PostgreSQL-family tables this runs VACUUM (optionally VACUUM FULL); other flavors fall back to their respective space-reclaiming mechanisms where supported.

Parameters
  • full (bool, default False): If True (PostgreSQL family only), run VACUUM FULL to return freed space to the OS.
  • debug (bool, default False): Verbosity toggle.
Returns
def analyze(self, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
 66def analyze(
 67    self,
 68    debug: bool = False,
 69    **kw: Any
 70) -> SuccessTuple:
 71    """
 72    Call the Pipe's instance connector's `analyze_pipe()` method to refresh planner statistics.
 73
 74    Parameters
 75    ----------
 76    debug: bool, default False
 77        Verbosity toggle.
 78
 79    Returns
 80    -------
 81    A `SuccessTuple` of success, message.
 82    """
 83    from meerschaum.utils.venv import Venv
 84    from meerschaum.connectors import get_connector_plugin
 85
 86    try:
 87        with Venv(get_connector_plugin(self.instance_connector)):
 88            if hasattr(self.instance_connector, 'analyze_pipe'):
 89                result = self.instance_connector.analyze_pipe(self, debug=debug, **kw)
 90            else:
 91                result = (
 92                    False,
 93                    (
 94                        "Cannot analyze pipes for instance connectors of type "
 95                        f"'{self.instance_connector.type}'."
 96                    )
 97                )
 98    except NotImplementedError:
 99        result = (
100            False,
101            (
102                "Analyzing is not implemented for instance connectors of type "
103                f"'{self.instance_connector.type}'."
104            )
105        )
106
107    return result

Call the Pipe's instance connector's analyze_pipe() method to refresh planner statistics.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
def repartition( self, chunk_minutes: Optional[int] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
110def repartition(
111    self,
112    chunk_minutes: Optional[int] = None,
113    debug: bool = False,
114    **kw: Any
115) -> SuccessTuple:
116    """
117    Call the Pipe's instance connector's `partition_pipe()` method to rebuild the target table
118    to a new partition (chunk) width.
119
120    On TimescaleDB this changes the chunk interval for future chunks. On PostgreSQL / PostGIS,
121    MySQL / MariaDB, and MSSQL it rebuilds the natively range-partitioned table at the new width.
122
123    Parameters
124    ----------
125    chunk_minutes: Optional[int], default None
126        The new partition width in minutes. Defaults to the pipe's `verify.chunk_minutes`.
127
128    debug: bool, default False
129        Verbosity toggle.
130
131    Returns
132    -------
133    A `SuccessTuple` of success, message.
134    """
135    from meerschaum.utils.venv import Venv
136    from meerschaum.connectors import get_connector_plugin
137
138    try:
139        with Venv(get_connector_plugin(self.instance_connector)):
140            if hasattr(self.instance_connector, 'partition_pipe'):
141                result = self.instance_connector.partition_pipe(
142                    self, chunk_minutes=chunk_minutes, debug=debug, **kw
143                )
144            else:
145                result = (
146                    False,
147                    (
148                        "Cannot repartition pipes for instance connectors of type "
149                        f"'{self.instance_connector.type}'."
150                    )
151                )
152    except NotImplementedError:
153        result = (
154            False,
155            (
156                "Repartitioning is not implemented for instance connectors of type "
157                f"'{self.instance_connector.type}'."
158            )
159        )
160
161    self._clear_cache_key('_exists', debug=debug)
162    return result

Call the Pipe's instance connector's partition_pipe() method to rebuild the target table to a new partition (chunk) width.

On TimescaleDB this changes the chunk interval for future chunks. On PostgreSQL / PostGIS, MySQL / MariaDB, and MSSQL it rebuilds the natively range-partitioned table at the new width.

Parameters
  • chunk_minutes (Optional[int], default None): The new partition width in minutes. Defaults to the pipe's verify.chunk_minutes.
  • debug (bool, default False): Verbosity toggle.
Returns
def create_indices( self, columns: Optional[List[str]] = None, debug: bool = False, **kw: Any) -> Tuple[bool, str]:
14def create_indices(
15    self,
16    columns: Optional[List[str]] = None,
17    debug: bool = False,
18    **kw: Any
19) -> SuccessTuple:
20    """
21    Call the Pipe's instance connector's `create_pipe_indices()` method.
22
23    Parameters
24    ----------
25    debug: bool, default False:
26        Verbosity toggle.
27
28    Returns
29    -------
30    A `SuccessTuple` of success, message.
31
32    """
33    from meerschaum.utils.venv import Venv
34    from meerschaum.connectors import get_connector_plugin
35
36    self._clear_cache_key('_columns_indices', debug=debug)
37    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
38    self._clear_cache_key('_columns_types', debug=debug)
39    self._clear_cache_key('_columns_types_timestamp', debug=debug)
40
41    with Venv(get_connector_plugin(self.instance_connector)):
42        if hasattr(self.instance_connector, 'create_pipe_indices'):
43            result = self.instance_connector.create_pipe_indices(
44                self,
45                columns=columns,
46                debug=debug,
47                **kw
48            )
49        else:
50            result = (
51                False,
52                (
53                    "Cannot create indices for instance connectors of type "
54                    f"'{self.instance_connector.type}'."
55                )
56            )
57
58    self._clear_cache_key('_columns_indices', debug=debug)
59    self._clear_cache_key('_columns_indices_timestamp', debug=debug)
60    self._clear_cache_key('_columns_types', debug=debug)
61    self._clear_cache_key('_columns_types_timestamp', debug=debug)
62
63    return result

Call the Pipe's instance connector's create_pipe_indices() method.

Parameters
  • debug (bool, default False:): Verbosity toggle.
Returns
def clear( self, begin: Optional[datetime.datetime] = None, end: Optional[datetime.datetime] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
16def clear(
17    self,
18    begin: Optional[datetime] = None,
19    end: Optional[datetime] = None,
20    params: Optional[Dict[str, Any]] = None,
21    debug: bool = False,
22    **kwargs: Any
23) -> SuccessTuple:
24    """
25    Call the Pipe's instance connector's `clear_pipe` method.
26
27    Parameters
28    ----------
29    begin: Optional[datetime], default None:
30        If provided, only remove rows newer than this datetime value.
31
32    end: Optional[datetime], default None:
33        If provided, only remove rows older than this datetime column (not including end).
34
35    params: Optional[Dict[str, Any]], default None
36         See `meerschaum.utils.sql.build_where`.
37
38    debug: bool, default False:
39        Verbositity toggle.
40
41    Returns
42    -------
43    A `SuccessTuple` corresponding to whether this procedure completed successfully.
44
45    Examples
46    --------
47    >>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
48    >>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
49    >>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
50    >>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
51    >>> 
52    >>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
53    >>> pipe.get_data()
54              dt
55    0 2020-01-01
56
57    """
58    from meerschaum.utils.warnings import warn
59    from meerschaum.utils.venv import Venv
60    from meerschaum.connectors import get_connector_plugin
61
62    begin, end = self.parse_date_bounds(begin, end)
63
64    with Venv(get_connector_plugin(self.instance_connector)):
65        return self.instance_connector.clear_pipe(
66            self,
67            begin=begin,
68            end=end,
69            params=params,
70            debug=debug,
71            **kwargs
72        )

Call the Pipe's instance connector's clear_pipe method.

Parameters
  • begin (Optional[datetime], default None:): If provided, only remove rows newer than this datetime value.
  • end (Optional[datetime], default None:): If provided, only remove rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): See meerschaum.utils.sql.build_where.
  • debug (bool, default False:): Verbositity toggle.
Returns
  • A SuccessTuple corresponding to whether this procedure completed successfully.
Examples
>>> pipe = mrsm.Pipe('test', 'test', columns={'datetime': 'dt'}, instance='sql:local')
>>> pipe.sync({'dt': [datetime(2020, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2021, 1, 1, 0, 0)]})
>>> pipe.sync({'dt': [datetime(2022, 1, 1, 0, 0)]})
>>> 
>>> pipe.clear(begin=datetime(2021, 1, 1, 0, 0))
>>> pipe.get_data()
          dt
0 2020-01-01
def deduplicate( self, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.datetime, int, NoneType] = None, bounded: Optional[bool] = None, workers: Optional[int] = None, debug: bool = False, _use_instance_method: bool = True, **kwargs: Any) -> Tuple[bool, str]:
 15def deduplicate(
 16    self,
 17    begin: Union[datetime, int, None] = None,
 18    end: Union[datetime, int, None] = None,
 19    params: Optional[Dict[str, Any]] = None,
 20    chunk_interval: Union[datetime, int, None] = None,
 21    bounded: Optional[bool] = None,
 22    workers: Optional[int] = None,
 23    debug: bool = False,
 24    _use_instance_method: bool = True,
 25    **kwargs: Any
 26) -> SuccessTuple:
 27    """
 28    Call the Pipe's instance connector's `delete_duplicates` method to delete duplicate rows.
 29
 30    Parameters
 31    ----------
 32    begin: Union[datetime, int, None], default None:
 33        If provided, only deduplicate rows newer than this datetime value.
 34
 35    end: Union[datetime, int, None], default None:
 36        If provided, only deduplicate rows older than this datetime column (not including end).
 37
 38    params: Optional[Dict[str, Any]], default None
 39        Restrict deduplication to this filter (for multiplexed data streams).
 40        See `meerschaum.utils.sql.build_where`.
 41
 42    chunk_interval: Union[timedelta, int, None], default None
 43        If provided, use this for the chunk bounds.
 44        Defaults to the value set in `pipe.parameters['verify']['chunk_minutes']` (43200 — 30 days).
 45
 46    bounded: Optional[bool], default None
 47        Only check outside the oldest and newest sync times if bounded is explicitly `False`.
 48
 49    workers: Optional[int], default None
 50        If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
 51
 52    debug: bool, default False:
 53        Verbositity toggle.
 54
 55    kwargs: Any
 56        All other keyword arguments are passed to
 57        `pipe.sync()`, `pipe.clear()`, and `pipe.get_data().
 58
 59    Returns
 60    -------
 61    A `SuccessTuple` corresponding to whether all of the chunks were successfully deduplicated.
 62    """
 63    from meerschaum.utils.warnings import warn, info
 64    from meerschaum.utils.misc import interval_str, items_str
 65    from meerschaum.utils.venv import Venv
 66    from meerschaum.connectors import get_connector_plugin
 67    from meerschaum.utils.pool import get_pool
 68
 69    begin, end = self.parse_date_bounds(begin, end)
 70
 71    workers = self.get_num_workers(workers=workers)
 72    pool = get_pool(workers=workers)
 73
 74    if _use_instance_method:
 75        with Venv(get_connector_plugin(self.instance_connector)):
 76            if hasattr(self.instance_connector, 'deduplicate_pipe'):
 77                return self.instance_connector.deduplicate_pipe(
 78                    self,
 79                    begin=begin,
 80                    end=end,
 81                    params=params,
 82                    bounded=bounded,
 83                    debug=debug,
 84                    **kwargs
 85                )
 86
 87    ### Only unbound if explicitly False.
 88    if bounded is None:
 89        bounded = True
 90    chunk_interval = self.get_chunk_interval(chunk_interval, debug=debug)
 91
 92    bound_time = self.get_bound_time(debug=debug)
 93    if bounded and begin is None:
 94        begin = (
 95            bound_time
 96            if bound_time is not None
 97            else self.get_sync_time(newest=False, debug=debug)
 98        )
 99    if bounded and end is None:
100        end = self.get_sync_time(newest=True, debug=debug)
101        if end is not None:
102            end += (
103                timedelta(minutes=1)
104                if hasattr(end, 'tzinfo')
105                else 1
106            )
107
108    chunk_bounds = self.get_chunk_bounds(
109        bounded=bounded,
110        begin=begin,
111        end=end,
112        chunk_interval=chunk_interval,
113        debug=debug,
114    )
115
116    indices = [col for col in self.columns.values() if col]
117    if not indices:
118        return False, "Cannot deduplicate without index columns."
119
120    def process_chunk_bounds(bounds) -> Tuple[
121        Tuple[
122            Union[datetime, int, None],
123            Union[datetime, int, None]
124        ],
125        SuccessTuple
126    ]:
127        ### Only selecting the index values here to keep bandwidth down.
128        chunk_begin, chunk_end = bounds
129        chunk_df = self.get_data(
130            select_columns=indices, 
131            begin=chunk_begin,
132            end=chunk_end,
133            params=params,
134            debug=debug,
135        )
136        if chunk_df is None:
137            return bounds, (True, "")
138        existing_chunk_len = len(chunk_df)
139        deduped_chunk_df = chunk_df.drop_duplicates(keep='last')
140        deduped_chunk_len = len(deduped_chunk_df)
141
142        if existing_chunk_len == deduped_chunk_len:
143            return bounds, (True, "")
144
145        chunk_msg_header = f"\n{chunk_begin} - {chunk_end}"
146        chunk_msg_body = ""
147
148        full_chunk = self.get_data(
149            begin=chunk_begin,
150            end=chunk_end,
151            params=params,
152            debug=debug,
153        )
154        if full_chunk is None or len(full_chunk) == 0:
155            return bounds, (True, f"{chunk_msg_header}\nChunk is empty, skipping...")
156
157        chunk_indices = [ix for ix in indices if ix in full_chunk.columns]
158        if not chunk_indices:
159            return bounds, (False, f"None of {items_str(indices)} were present in chunk.")
160        try:
161            full_chunk = full_chunk.drop_duplicates(
162                subset=chunk_indices,
163                keep='last'
164            ).reset_index(
165                drop=True,
166            )
167        except Exception as e:
168            return (
169                bounds,
170                (False, f"Failed to deduplicate chunk on {items_str(chunk_indices)}:\n({e})")
171            )
172
173        clear_success, clear_msg = self.clear(
174            begin=chunk_begin,
175            end=chunk_end,
176            params=params,
177            debug=debug,
178        )
179        if not clear_success:
180            chunk_msg_body += f"Failed to clear chunk while deduplicating:\n{clear_msg}\n"
181            warn(chunk_msg_body)
182
183        sync_success, sync_msg = self.sync(full_chunk, debug=debug)
184        if not sync_success:
185            chunk_msg_body += f"Failed to sync chunk while deduplicating:\n{sync_msg}\n"
186
187        ### Finally check if the deduplication worked.
188        chunk_rowcount = self.get_rowcount(
189            begin=chunk_begin,
190            end=chunk_end,
191            params=params,
192            debug=debug,
193        )
194        if chunk_rowcount != deduped_chunk_len:
195            return bounds, (
196                False, (
197                    chunk_msg_header + "\n"
198                    + chunk_msg_body + ("\n" if chunk_msg_body else '')
199                    + "Chunk rowcounts still differ ("
200                    + f"{chunk_rowcount} rowcount vs {deduped_chunk_len} chunk length)."
201                )
202            )
203
204        return bounds, (
205            True, (
206                chunk_msg_header + "\n"
207                + chunk_msg_body + ("\n" if chunk_msg_body else '')
208                + f"Deduplicated chunk from {existing_chunk_len} to {chunk_rowcount} rows."
209            )
210        )
211
212    info(
213        f"Deduplicating {len(chunk_bounds)} chunk"
214        + ('s' if len(chunk_bounds) != 1 else '')
215        + f" ({'un' if not bounded else ''}bounded)"
216        + f" of size '{interval_str(chunk_interval)}'"
217        + f" on {self}."
218    )
219    bounds_success_tuples = dict(pool.map(process_chunk_bounds, chunk_bounds))
220    bounds_successes = {
221        bounds: success_tuple
222        for bounds, success_tuple in bounds_success_tuples.items()
223        if success_tuple[0]
224    }
225    bounds_failures = {
226        bounds: success_tuple
227        for bounds, success_tuple in bounds_success_tuples.items()
228        if not success_tuple[0]
229    }
230
231    ### No need to retry if everything failed.
232    if len(bounds_failures) > 0 and len(bounds_successes) == 0:
233        return (
234            False,
235            (
236                f"Failed to deduplicate {len(bounds_failures)} chunk"
237                + ('s' if len(bounds_failures) != 1 else '')
238                + ".\n"
239                + "\n".join([msg for _, (_, msg) in bounds_failures.items() if msg])
240            )
241        )
242
243    retry_bounds = [bounds for bounds in bounds_failures]
244    if not retry_bounds:
245        return (
246            True,
247            (
248                f"Successfully deduplicated {len(bounds_successes)} chunk"
249                + ('s' if len(bounds_successes) != 1 else '')
250                + ".\n"
251                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
252            ).rstrip('\n')
253        )
254
255    info(f"Retrying {len(retry_bounds)} chunks for {self}...")
256    retry_bounds_success_tuples = dict(pool.map(process_chunk_bounds, retry_bounds))
257    retry_bounds_successes = {
258        bounds: success_tuple
259        for bounds, success_tuple in retry_bounds_success_tuples.items()
260        if success_tuple[0]
261    }
262    retry_bounds_failures = {
263        bounds: success_tuple
264        for bounds, success_tuple in retry_bounds_success_tuples.items()
265        if not success_tuple[0]
266    }
267
268    bounds_successes.update(retry_bounds_successes)
269    if not retry_bounds_failures:
270        return (
271            True,
272            (
273                f"Successfully deduplicated {len(bounds_successes)} chunk"
274                + ('s' if len(bounds_successes) != 1 else '')
275                + f"({len(retry_bounds_successes)} retried):\n"
276                + "\n".join([msg for _, (_, msg) in bounds_successes.items() if msg])
277            ).rstrip('\n')
278        )
279
280    return (
281        False,
282        (
283            f"Failed to deduplicate {len(bounds_failures)} chunk"
284            + ('s' if len(retry_bounds_failures) != 1 else '')
285            + ".\n"
286            + "\n".join([msg for _, (_, msg) in retry_bounds_failures.items() if msg])
287        ).rstrip('\n')
288    )

Call the Pipe's instance connector's delete_duplicates method to delete duplicate rows.

Parameters
  • begin (Union[datetime, int, None], default None:): If provided, only deduplicate rows newer than this datetime value.
  • end (Union[datetime, int, None], default None:): If provided, only deduplicate rows older than this datetime column (not including end).
  • params (Optional[Dict[str, Any]], default None): Restrict deduplication to this filter (for multiplexed data streams). See meerschaum.utils.sql.build_where.
  • chunk_interval (Union[timedelta, int, None], default None): If provided, use this for the chunk bounds. Defaults to the value set in pipe.parameters['verify']['chunk_minutes'] (43200 — 30 days).
  • bounded (Optional[bool], default None): Only check outside the oldest and newest sync times if bounded is explicitly False.
  • workers (Optional[int], default None): If the instance connector is thread-safe, limit concurrenct syncs to this many threads.
  • debug (bool, default False:): Verbositity toggle.
  • kwargs (Any): All other keyword arguments are passed to pipe.sync(), pipe.clear(), and `pipe.get_data().
Returns
  • A SuccessTuple corresponding to whether all of the chunks were successfully deduplicated.
def bootstrap( self, debug: bool = False, yes: bool = False, force: bool = False, noask: bool = False, shell: bool = False, **kw) -> Tuple[bool, str]:
 16def bootstrap(
 17    self,
 18    debug: bool = False,
 19    yes: bool = False,
 20    force: bool = False,
 21    noask: bool = False,
 22    shell: bool = False,
 23    **kw
 24) -> SuccessTuple:
 25    """
 26    Prompt the user to create a pipe's requirements all from one method.
 27    This method shouldn't be used in any automated scripts because it interactively
 28    prompts the user and therefore may hang.
 29
 30    Parameters
 31    ----------
 32    debug: bool, default False:
 33        Verbosity toggle.
 34
 35    yes: bool, default False:
 36        Print the questions and automatically agree.
 37
 38    force: bool, default False:
 39        Skip the questions and agree anyway.
 40
 41    noask: bool, default False:
 42        Print the questions but go with the default answer.
 43
 44    shell: bool, default False:
 45        Used to determine if we are in the interactive shell.
 46        
 47    Returns
 48    -------
 49    A `SuccessTuple` corresponding to the success of this procedure.
 50
 51    """
 52
 53    from meerschaum.utils.warnings import info
 54    from meerschaum.utils.prompt import prompt, yes_no
 55    from meerschaum.utils.formatting import pprint
 56    from meerschaum.config import get_config
 57    from meerschaum.utils.formatting._shell import clear_screen
 58    from meerschaum.utils.formatting import print_tuple
 59    from meerschaum.actions import actions
 60    from meerschaum.utils.venv import Venv
 61    from meerschaum.connectors import get_connector_plugin
 62
 63    _clear = get_config('shell', 'clear_screen', patch=True)
 64
 65    if self.id is not None:
 66        delete_tuple = self.delete(debug=debug)
 67        if not delete_tuple[0]:
 68            return delete_tuple
 69
 70    if _clear:
 71        clear_screen(debug=debug)
 72
 73    _parameters = _get_parameters(self, debug=debug)
 74    self.parameters = _parameters
 75    pprint(self.parameters)
 76    try:
 77        prompt(
 78            f"\n    Press [Enter] to register {self} with the above configuration:",
 79            icon = False
 80        )
 81    except KeyboardInterrupt:
 82        return False, f"Aborted bootstrapping {self}."
 83
 84    with Venv(get_connector_plugin(self.instance_connector)):
 85        register_tuple = self.instance_connector.register_pipe(self, debug=debug)
 86
 87    if not register_tuple[0]:
 88        return register_tuple
 89
 90    if _clear:
 91        clear_screen(debug=debug)
 92
 93    try:
 94        if yes_no(
 95            f"Would you like to edit the definition for {self}?",
 96            yes=yes,
 97            noask=noask,
 98            default='n',
 99        ):
100            edit_tuple = self.edit_definition(debug=debug)
101            if not edit_tuple[0]:
102                return edit_tuple
103
104        if yes_no(
105            f"Would you like to try syncing {self} now?",
106            yes=yes,
107            noask=noask,
108            default='n',
109        ):
110            sync_tuple = actions['sync'](
111                ['pipes'],
112                connector_keys=[self.connector_keys],
113                metric_keys=[self.metric_key],
114                location_keys=[self.location_key],
115                mrsm_instance=str(self.instance_connector),
116                debug=debug,
117                shell=shell,
118            )
119            if not sync_tuple[0]:
120                return sync_tuple
121    except Exception as e:
122        return False, f"Failed to bootstrap {self}:\n" + str(e)
123
124    print_tuple((True, f"Finished bootstrapping {self}!"))
125    info(
126        "You can edit this pipe later with `edit pipes` "
127        + "or set the definition with `edit pipes definition`.\n"
128        + "    To sync data into your pipe, run `sync pipes`."
129    )
130
131    return True, "Success"

Prompt the user to create a pipe's requirements all from one method. This method shouldn't be used in any automated scripts because it interactively prompts the user and therefore may hang.

Parameters
  • debug (bool, default False:): Verbosity toggle.
  • yes (bool, default False:): Print the questions and automatically agree.
  • force (bool, default False:): Skip the questions and agree anyway.
  • noask (bool, default False:): Print the questions but go with the default answer.
  • shell (bool, default False:): Used to determine if we are in the interactive shell.
Returns
  • A SuccessTuple corresponding to the success of this procedure.
def enforce_dtypes( self, df: pandas.DataFrame, chunksize: Optional[int] = -1, enforce: bool = True, safe_copy: bool = True, dtypes: Optional[Dict[str, str]] = None, debug: bool = False) -> pandas.DataFrame:
 20def enforce_dtypes(
 21    self,
 22    df: 'pd.DataFrame',
 23    chunksize: Optional[int] = -1,
 24    enforce: bool = True,
 25    safe_copy: bool = True,
 26    dtypes: Optional[Dict[str, str]] = None,
 27    debug: bool = False,
 28) -> 'pd.DataFrame':
 29    """
 30    Cast the input dataframe to the pipe's registered data types.
 31    If the pipe does not exist and dtypes are not set, return the dataframe.
 32    """
 33    import traceback
 34    from meerschaum.utils.warnings import warn
 35    from meerschaum.utils.debug import dprint
 36    from meerschaum.utils.dataframe import (
 37        parse_df_datetimes,
 38        enforce_dtypes as _enforce_dtypes,
 39        parse_simple_lines,
 40    )
 41    from meerschaum.utils.dtypes import are_dtypes_equal
 42    from meerschaum.utils.packages import import_pandas
 43    pd = import_pandas(debug=debug)
 44    if df is None:
 45        if debug:
 46            dprint(
 47                "Received None instead of a DataFrame.\n"
 48                + "    Skipping dtype enforcement..."
 49            )
 50        return df
 51
 52    if not self.enforce:
 53        enforce = False
 54
 55    explicit_dtypes = self.get_dtypes(infer=False, debug=debug) if enforce else {}
 56    pipe_dtypes = self.get_dtypes(infer=True, debug=debug) if not dtypes else dtypes
 57
 58    try:
 59        if isinstance(df, str):
 60            if df.strip() and df.strip()[0] not in ('{', '['):
 61                df = parse_df_datetimes(
 62                    parse_simple_lines(df),
 63                    ignore_cols=[
 64                        col
 65                        for col, dtype in pipe_dtypes.items()
 66                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 67                    ],
 68                )
 69            else:
 70                df = parse_df_datetimes(
 71                    pd.read_json(StringIO(df)),
 72                    ignore_cols=[
 73                        col
 74                        for col, dtype in pipe_dtypes.items()
 75                        if (not enforce or not are_dtypes_equal(dtype, 'datetime'))
 76                    ],
 77                    ignore_all=(not enforce),
 78                    strip_timezone=(self.tzinfo is None),
 79                    chunksize=chunksize,
 80                    debug=debug,
 81                )
 82        elif isinstance(df, (dict, list, tuple)):
 83            df = parse_df_datetimes(
 84                df,
 85                ignore_cols=[
 86                    col
 87                    for col, dtype in pipe_dtypes.items()
 88                    if (not enforce or not are_dtypes_equal(str(dtype), 'datetime'))
 89                ],
 90                strip_timezone=(self.tzinfo is None),
 91                chunksize=chunksize,
 92                debug=debug,
 93            )
 94    except Exception as e:
 95        warn(f"Unable to cast incoming data as a DataFrame...:\n{e}\n\n{traceback.format_exc()}")
 96        return None
 97
 98    if not pipe_dtypes:
 99        if debug:
100            dprint(
101                f"Could not find dtypes for {self}.\n"
102                + "Skipping dtype enforcement..."
103            )
104        return df
105
106    return _enforce_dtypes(
107        df,
108        pipe_dtypes,
109        explicit_dtypes=explicit_dtypes,
110        safe_copy=safe_copy,
111        strip_timezone=(self.tzinfo is None),
112        coerce_numeric=self.mixed_numerics,
113        coerce_timezone=enforce,
114        debug=debug,
115    )

Cast the input dataframe to the pipe's registered data types. If the pipe does not exist and dtypes are not set, return the dataframe.

def infer_dtypes( self, persist: bool = False, refresh: bool = False, debug: bool = False) -> Dict[str, Any]:
118def infer_dtypes(
119    self,
120    persist: bool = False,
121    refresh: bool = False,
122    debug: bool = False,
123) -> Dict[str, Any]:
124    """
125    If `dtypes` is not set in `meerschaum.Pipe.parameters`,
126    infer the data types from the underlying table if it exists.
127
128    Parameters
129    ----------
130    persist: bool, default False
131        If `True`, persist the inferred data types to `meerschaum.Pipe.parameters`.
132        NOTE: Use with caution! Generally `dtypes` is meant to be user-configurable only.
133
134    refresh: bool, default False
135        If `True`, retrieve the latest columns-types for the pipe.
136        See `Pipe.get_columns.types()`.
137
138    Returns
139    -------
140    A dictionary of strings containing the pandas data types for this Pipe.
141    """
142    if not self.exists(debug=debug):
143        return {}
144
145    from meerschaum.utils.dtypes.sql import get_pd_type_from_db_type
146    from meerschaum.utils.dtypes import to_pandas_dtype
147
148    ### NOTE: get_columns_types() may return either the types as
149    ###       PostgreSQL- or Pandas-style.
150    columns_types = self.get_columns_types(refresh=refresh, debug=debug)
151
152    remote_pd_dtypes = {
153        c: (
154            get_pd_type_from_db_type(t, allow_custom_dtypes=True)
155            if str(t).isupper()
156            else to_pandas_dtype(t)
157        )
158        for c, t in columns_types.items()
159    } if columns_types else {}
160    if not persist:
161        return remote_pd_dtypes
162
163    parameters = self.get_parameters(refresh=refresh, debug=debug)
164    dtypes = parameters.get('dtypes', {})
165    dtypes.update({
166        col: typ
167        for col, typ in remote_pd_dtypes.items()
168        if col not in dtypes
169    })
170    self.dtypes = dtypes
171    self.edit(interactive=False, debug=debug)
172    return remote_pd_dtypes

If dtypes is not set in meerschaum.Pipe.parameters, infer the data types from the underlying table if it exists.

Parameters
  • persist (bool, default False): If True, persist the inferred data types to meerschaum.Pipe.parameters. NOTE: Use with caution! Generally dtypes is meant to be user-configurable only.
  • refresh (bool, default False): If True, retrieve the latest columns-types for the pipe. See Pipe.get_columns.types().
Returns
  • A dictionary of strings containing the pandas data types for this Pipe.
def copy_to( self, instance_keys: str, sync: bool = True, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, chunk_interval: Union[datetime.timedelta, int, NoneType] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 15def copy_to(
 16    self,
 17    instance_keys: str,
 18    sync: bool = True,
 19    begin: Union[datetime, int, None] = None,
 20    end: Union[datetime, int, None] = None,
 21    params: Optional[Dict[str, Any]] = None,
 22    chunk_interval: Union[timedelta, int, None] = None,
 23    debug: bool = False,
 24    **kwargs: Any
 25) -> SuccessTuple:
 26    """
 27    Copy a pipe to another instance.
 28
 29    Parameters
 30    ----------
 31    instance_keys: str
 32        The instance to which to copy this pipe.
 33
 34    sync: bool, default True
 35        If `True`, sync the source pipe's documents 
 36
 37    begin: Union[datetime, int, None], default None
 38        Beginning datetime value to pass to `Pipe.get_data()`.
 39
 40    end: Union[datetime, int, None], default None
 41        End datetime value to pass to `Pipe.get_data()`.
 42
 43    params: Optional[Dict[str, Any]], default None
 44        Parameters filter to pass to `Pipe.get_data()`.
 45
 46    chunk_interval: Union[timedelta, int, None], default None
 47        The size of chunks to retrieve from `Pipe.get_data()` for syncing.
 48
 49    kwargs: Any
 50        Additional flags to pass to `Pipe.get_data()` and `Pipe.sync()`, e.g. `workers`.
 51
 52    Returns
 53    -------
 54    A SuccessTuple indicating success.
 55    """
 56    if str(instance_keys) == self.instance_keys:
 57        return False, f"Cannot copy {self} to instance '{instance_keys}'."
 58
 59    begin, end = self.parse_date_bounds(begin, end)
 60
 61    new_pipe = mrsm.Pipe(
 62        self.connector_keys,
 63        self.metric_key,
 64        self.location_key,
 65        parameters=self.parameters.copy(),
 66        instance=instance_keys,
 67    )
 68
 69    new_pipe_is_registered = new_pipe.id is not None
 70
 71    metadata_method = new_pipe.edit if new_pipe_is_registered else new_pipe.register
 72    metadata_success, metadata_msg = metadata_method(debug=debug)
 73    if not metadata_success:
 74        return metadata_success, metadata_msg
 75
 76    if not self.exists(debug=debug):
 77        return True, f"{self} does not exist; nothing to sync."
 78
 79    original_as_iterator = kwargs.get('as_iterator', None)
 80    kwargs['as_iterator'] = True
 81
 82    chunk_generator = self.get_data(
 83        begin=begin,
 84        end=end,
 85        params=params,
 86        chunk_interval=chunk_interval,
 87        debug=debug,
 88        **kwargs
 89    )
 90
 91    if original_as_iterator is None:
 92        _ = kwargs.pop('as_iterator', None)
 93    else:
 94        kwargs['as_iterator'] = original_as_iterator
 95
 96    sync_success, sync_msg = new_pipe.sync(
 97        chunk_generator,
 98        begin=begin,
 99        end=end,
100        params=params,
101        debug=debug,
102        **kwargs
103    )
104    msg = (
105        f"Successfully synced {new_pipe}:\n{sync_msg}"
106        if sync_success
107        else f"Failed to sync {new_pipe}:\n{sync_msg}"
108    )
109    return sync_success, msg

Copy a pipe to another instance.

Parameters
  • instance_keys (str): The instance to which to copy this pipe.
  • sync (bool, default True): If True, sync the source pipe's documents
  • begin (Union[datetime, int, None], default None): Beginning datetime value to pass to Pipe.get_data().
  • end (Union[datetime, int, None], default None): End datetime value to pass to Pipe.get_data().
  • params (Optional[Dict[str, Any]], default None): Parameters filter to pass to Pipe.get_data().
  • chunk_interval (Union[timedelta, int, None], default None): The size of chunks to retrieve from Pipe.get_data() for syncing.
  • kwargs (Any): Additional flags to pass to Pipe.get_data() and Pipe.sync(), e.g. workers.
Returns
  • A SuccessTuple indicating success.
class Plugin:
 30class Plugin:
 31    """Handle packaging of Meerschaum plugins."""
 32
 33    def __init__(
 34        self,
 35        name: str,
 36        version: Optional[str] = None,
 37        user_id: Optional[int] = None,
 38        required: Optional[List[str]] = None,
 39        attributes: Optional[Dict[str, Any]] = None,
 40        archive_path: Optional[pathlib.Path] = None,
 41        venv_path: Optional[pathlib.Path] = None,
 42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
 43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
 44    ):
 45        import meerschaum.config.paths as paths
 46        from meerschaum._internal.static import STATIC_CONFIG
 47        sep = STATIC_CONFIG['plugins']['repo_separator']
 48        _repo = None
 49        if sep in name:
 50            try:
 51                name, _repo = name.split(sep)
 52            except Exception as e:
 53                error(f"Invalid plugin name: '{name}'")
 54        self._repo_in_name = _repo
 55
 56        if attributes is None:
 57            attributes = {}
 58        self.name = name
 59        self.attributes = attributes
 60        self.user_id = user_id
 61        self._version = version
 62        if required:
 63            self._required = required
 64        self.archive_path = (
 65            archive_path if archive_path is not None
 66            else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
 67        )
 68        self.venv_path = (
 69            venv_path if venv_path is not None
 70            else paths.VIRTENV_RESOURCES_PATH / self.name
 71        )
 72        self._repo_connector = repo_connector
 73        self._repo_keys = repo
 74
 75
 76    @property
 77    def repo_connector(self):
 78        """
 79        Return the repository connector for this plugin.
 80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
 81        """
 82        if self._repo_connector is None:
 83            from meerschaum.connectors.parse import parse_repo_keys
 84
 85            repo_keys = self._repo_keys or self._repo_in_name
 86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
 87                error(
 88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
 89                )
 90            repo_connector = parse_repo_keys(repo_keys)
 91            self._repo_connector = repo_connector
 92        return self._repo_connector
 93
 94
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version
106
107
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module
121
122
123    @property
124    def __file__(self) -> Union[str, None]:
125        """
126        Return the file path (str) of the plugin if it exists, otherwise `None`.
127        """
128        if self.__dict__.get('_module', None) is not None:
129            return self.module.__file__
130
131        import meerschaum.config.paths as paths
132
133        potential_dir = paths.PLUGINS_RESOURCES_PATH / self.name
134        if (
135            potential_dir.exists()
136            and potential_dir.is_dir()
137            and (potential_dir / '__init__.py').exists()
138        ):
139            return str((potential_dir / '__init__.py').as_posix())
140
141        potential_file = paths.PLUGINS_RESOURCES_PATH / (self.name + '.py')
142        if potential_file.exists() and not potential_file.is_dir():
143            return str(potential_file.as_posix())
144
145        return None
146
147
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path
159
160
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None
170
171
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path
255
256
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288
289        import meerschaum.config.paths as paths
290        from meerschaum.utils.warnings import warn, error
291        if debug:
292            from meerschaum.utils.debug import dprint
293        import tarfile
294        import re
295        import ast
296        from meerschaum.plugins import sync_plugins_symlinks
297        from meerschaum.utils.packages import attempt_import, reload_meerschaum
298        from meerschaum.utils.venv import init_venv
299        from meerschaum.utils.misc import safely_extract_tar
300        old_cwd = os.getcwd()
301        old_version = ''
302        new_version = ''
303        temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name
304        temp_dir.mkdir(exist_ok=True)
305
306        if not self.archive_path.exists():
307            return False, f"Missing archive file for plugin '{self}'."
308        if self.version is not None:
309            old_version = self.version
310            if debug:
311                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
312
313        if debug:
314            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
315
316        try:
317            with tarfile.open(self.archive_path, 'r:gz') as tarf:
318                safely_extract_tar(tarf, temp_dir)
319        except Exception as e:
320            warn(e)
321            return False, f"Failed to extract plugin '{self.name}'."
322
323        ### search for version information
324        files = os.listdir(temp_dir)
325        
326        if str(files[0]) == self.name:
327            is_dir = True
328        elif str(files[0]) == self.name + '.py':
329            is_dir = False
330        else:
331            error(f"Unknown format encountered for plugin '{self}'.")
332
333        fpath = temp_dir / files[0]
334        if is_dir:
335            fpath = fpath / '__init__.py'
336
337        init_venv(self.name, debug=debug)
338        with open(fpath, 'r', encoding='utf-8') as f:
339            init_lines = f.readlines()
340        new_version = None
341        for line in init_lines:
342            if '__version__' not in line:
343                continue
344            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
345            if not version_match:
346                continue
347            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
348            break
349        if not new_version:
350            warn(
351                f"No `__version__` defined for plugin '{self}'. "
352                + "Assuming new version...",
353                stack = False,
354            )
355
356        packaging_version = attempt_import('packaging.version')
357        try:
358            is_new_version = (not new_version and not old_version) or (
359                packaging_version.parse(old_version) < packaging_version.parse(new_version)
360            )
361            is_same_version = new_version and old_version and (
362                packaging_version.parse(old_version) == packaging_version.parse(new_version)
363            )
364        except Exception:
365            is_new_version, is_same_version = True, False
366
367        ### Determine where to permanently store the new plugin.
368        plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0]
369        for path in paths.PLUGINS_DIR_PATHS:
370            if not path.exists():
371                warn(f"Plugins path does not exist: {path}", stack=False)
372                continue
373
374            files_in_plugins_dir = os.listdir(path)
375            if (
376                self.name in files_in_plugins_dir
377                or
378                (self.name + '.py') in files_in_plugins_dir
379            ):
380                plugin_installation_dir_path = path
381                break
382
383        success_msg = (
384            f"Successfully installed plugin '{self}'"
385            + ("\n    (skipped dependencies)" if skip_deps else "")
386            + "."
387        )
388        success, abort = None, None
389
390        if is_same_version and not force:
391            success, msg = True, (
392                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
393                "    Install again with `-f` or `--force` to reinstall."
394            )
395            abort = True
396        elif is_new_version or force:
397            for src_dir, dirs, files in os.walk(temp_dir):
398                if success is not None:
399                    break
400                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
401                if not os.path.exists(dst_dir):
402                    os.mkdir(dst_dir)
403                for f in files:
404                    src_file = os.path.join(src_dir, f)
405                    dst_file = os.path.join(dst_dir, f)
406                    if os.path.exists(dst_file):
407                        os.remove(dst_file)
408
409                    if debug:
410                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
411                    try:
412                        shutil.move(src_file, dst_dir)
413                    except Exception:
414                        success, msg = False, (
415                            f"Failed to install plugin '{self}': " +
416                            f"Could not move file '{src_file}' to '{dst_dir}'"
417                        )
418                        print(msg)
419                        break
420            if success is None:
421                success, msg = True, success_msg
422        else:
423            success, msg = False, (
424                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
425                + f"attempted version {new_version}."
426            )
427
428        shutil.rmtree(temp_dir)
429        os.chdir(old_cwd)
430
431        ### Reload the plugin's module.
432        sync_plugins_symlinks(debug=debug)
433        if '_module' in self.__dict__:
434            del self.__dict__['_module']
435        init_venv(venv=self.name, force=True, debug=debug)
436        reload_meerschaum(debug=debug)
437
438        ### if we've already failed, return here
439        if not success or abort:
440            _ongoing_installations.remove(self.full_name)
441            return success, msg
442
443        ### attempt to install dependencies
444        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
445        if not dependencies_installed:
446            _ongoing_installations.remove(self.full_name)
447            return False, f"Failed to install dependencies for plugin '{self}'."
448
449        ### handling success tuple, bool, or other (typically None)
450        setup_tuple = self.setup(debug=debug)
451        if isinstance(setup_tuple, tuple):
452            if not setup_tuple[0]:
453                success, msg = setup_tuple
454        elif isinstance(setup_tuple, bool):
455            if not setup_tuple:
456                success, msg = False, (
457                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
458                    f"Check `setup()` in '{self.__file__}' for more information " +
459                    "(no error message provided)."
460                )
461            else:
462                success, msg = True, success_msg
463        elif setup_tuple is None:
464            success = True
465            msg = (
466                f"Post-install for plugin '{self}' returned None. " +
467                "Assuming plugin successfully installed."
468            )
469            warn(msg)
470        else:
471            success = False
472            msg = (
473                f"Post-install for plugin '{self}' returned unexpected value " +
474                f"of type '{type(setup_tuple)}': {setup_tuple}"
475            )
476
477        _ongoing_installations.remove(self.full_name)
478        _ = self.module
479        return success, msg
480
481
482    def remove_archive(
483        self,        
484        debug: bool = False
485    ) -> SuccessTuple:
486        """Remove a plugin's archive file."""
487        if not self.archive_path.exists():
488            return True, f"Archive file for plugin '{self}' does not exist."
489        try:
490            self.archive_path.unlink()
491        except Exception as e:
492            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
493        return True, "Success"
494
495
496    def remove_venv(
497        self,        
498        debug: bool = False
499    ) -> SuccessTuple:
500        """Remove a plugin's virtual environment."""
501        if not self.venv_path.exists():
502            return True, f"Virtual environment for plugin '{self}' does not exist."
503        try:
504            shutil.rmtree(self.venv_path)
505        except Exception as e:
506            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
507        return True, "Success"
508
509
510    def uninstall(self, debug: bool = False) -> SuccessTuple:
511        """
512        Remove a plugin, its virtual environment, and archive file.
513        """
514        from meerschaum.utils.packages import reload_meerschaum
515        from meerschaum.plugins import sync_plugins_symlinks
516        from meerschaum.utils.warnings import warn, info
517        warnings_thrown_count: int = 0
518        max_warnings: int = 3
519
520        if not self.is_installed():
521            info(
522                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
523                + "Checking for artifacts...",
524                stack = False,
525            )
526        else:
527            real_path = pathlib.Path(os.path.realpath(self.__file__))
528            try:
529                if real_path.name == '__init__.py':
530                    shutil.rmtree(real_path.parent)
531                else:
532                    real_path.unlink()
533            except Exception as e:
534                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
535                warnings_thrown_count += 1
536            else:
537                info(f"Removed source files for plugin '{self.name}'.")
538
539        if self.venv_path.exists():
540            success, msg = self.remove_venv(debug=debug)
541            if not success:
542                warn(msg, stack=False)
543                warnings_thrown_count += 1
544            else:
545                info(f"Removed virtual environment from plugin '{self.name}'.")
546
547        success = warnings_thrown_count < max_warnings
548        sync_plugins_symlinks(debug=debug)
549        self.deactivate_venv(force=True, debug=debug)
550        reload_meerschaum(debug=debug)
551        return success, (
552            f"Successfully uninstalled plugin '{self}'." if success
553            else f"Failed to uninstall plugin '{self}'."
554        )
555
556
557    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
558        """
559        If exists, run the plugin's `setup()` function.
560
561        Parameters
562        ----------
563        *args: str
564            The positional arguments passed to the `setup()` function.
565            
566        debug: bool, default False
567            Verbosity toggle.
568
569        **kw: Any
570            The keyword arguments passed to the `setup()` function.
571
572        Returns
573        -------
574        A `SuccessTuple` or `bool` indicating success.
575
576        """
577        from meerschaum.utils.debug import dprint
578        import inspect
579        _setup = None
580        for name, fp in inspect.getmembers(self.module):
581            if name == 'setup' and inspect.isfunction(fp):
582                _setup = fp
583                break
584
585        ### assume success if no setup() is found (not necessary)
586        if _setup is None:
587            return True
588
589        sig = inspect.signature(_setup)
590        has_debug, has_kw = ('debug' in sig.parameters), False
591        for k, v in sig.parameters.items():
592            if '**' in str(v):
593                has_kw = True
594                break
595
596        _kw = {}
597        if has_kw:
598            _kw.update(kw)
599        if has_debug:
600            _kw['debug'] = debug
601
602        if debug:
603            dprint(f"Running setup for plugin '{self}'...")
604        try:
605            self.activate_venv(debug=debug)
606            return_tuple = _setup(*args, **_kw)
607            self.deactivate_venv(debug=debug)
608        except Exception as e:
609            return False, str(e)
610
611        if isinstance(return_tuple, tuple):
612            return return_tuple
613        if isinstance(return_tuple, bool):
614            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
615        if return_tuple is None:
616            return False, f"Setup for Plugin '{self.name}' returned None."
617        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"
618
619
620    def get_dependencies(
621        self,
622        debug: bool = False,
623    ) -> List[str]:
624        """
625        If the Plugin has specified dependencies in a list called `required`, return the list.
626        
627        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
628        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
629
630        Parameters
631        ----------
632        debug: bool, default False
633            Verbosity toggle.
634
635        Returns
636        -------
637        A list of required packages and plugins (str).
638
639        """
640        if '_required' in self.__dict__:
641            return self._required
642
643        ### If the plugin has not yet been imported,
644        ### infer the dependencies from the source text.
645        ### This is not super robust, and it doesn't feel right
646        ### having multiple versions of the logic.
647        ### This is necessary when determining the activation order
648        ### without having import the module.
649        ### For consistency's sake, the module-less method does not cache the requirements.
650        if self.__dict__.get('_module', None) is None:
651            file_path = self.__file__
652            if file_path is None:
653                return []
654            with open(file_path, 'r', encoding='utf-8') as f:
655                text = f.read()
656
657            if 'required' not in text:
658                return []
659
660            ### This has some limitations:
661            ### It relies on `required` being manually declared.
662            ### We lose the ability to dynamically alter the `required` list,
663            ### which is why we've kept the module-reliant method below.
664            import ast, re
665            ### NOTE: This technically would break 
666            ### if `required` was the very first line of the file.
667            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
668            if not req_start_match:
669                return []
670            req_start = req_start_match.start()
671            equals_sign = req_start + text[req_start:].find('=')
672
673            ### Dependencies may have brackets within the strings, so push back the index.
674            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
675            if first_opening_brace == -1:
676                return []
677
678            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
679            if next_closing_brace == -1:
680                return []
681
682            start_ix = first_opening_brace + 1
683            end_ix = next_closing_brace
684
685            num_braces = 0
686            while True:
687                if '[' not in text[start_ix:end_ix]:
688                    break
689                num_braces += 1
690                start_ix = end_ix
691                end_ix += text[end_ix + 1:].find(']') + 1
692
693            req_end = end_ix + 1
694            req_text = (
695                text[(first_opening_brace-1):req_end]
696                .lstrip()
697                .replace('=', '', 1)
698                .lstrip()
699                .rstrip()
700            )
701            try:
702                required = ast.literal_eval(req_text)
703            except Exception as e:
704                warn(
705                    f"Unable to determine requirements for plugin '{self.name}' "
706                    + "without importing the module.\n"
707                    + "    This may be due to dynamically setting the global `required` list.\n"
708                    + f"    {e}"
709                )
710                return []
711            return required
712
713        import inspect
714        self.activate_venv(dependencies=False, debug=debug)
715        required = []
716        for name, val in inspect.getmembers(self.module):
717            if name == 'required':
718                required = val
719                break
720        self._required = required
721        self.deactivate_venv(dependencies=False, debug=debug)
722        return required
723
724
725    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
726        """
727        Return a list of required Plugin objects.
728        """
729        from meerschaum.utils.warnings import warn
730        from meerschaum.config import get_config
731        from meerschaum._internal.static import STATIC_CONFIG
732        from meerschaum.connectors.parse import is_valid_connector_keys
733        plugins = []
734        _deps = self.get_dependencies(debug=debug)
735        sep = STATIC_CONFIG['plugins']['repo_separator']
736        plugin_names = [
737            _d[len('plugin:'):] for _d in _deps
738            if _d.startswith('plugin:') and len(_d) > len('plugin:')
739        ]
740        default_repo_keys = get_config('meerschaum', 'repository')
741        skipped_repo_keys = set()
742
743        for _plugin_name in plugin_names:
744            if sep in _plugin_name:
745                try:
746                    _plugin_name, _repo_keys = _plugin_name.split(sep)
747                except Exception:
748                    _repo_keys = default_repo_keys
749                    warn(
750                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
751                        + f"Will try to use '{_repo_keys}' instead.",
752                        stack = False,
753                    )
754            else:
755                _repo_keys = default_repo_keys
756
757            if _repo_keys in skipped_repo_keys:
758                continue
759
760            if not is_valid_connector_keys(_repo_keys):
761                warn(
762                    f"Invalid connector '{_repo_keys}'.\n"
763                    f"    Skipping required plugins from repository '{_repo_keys}'",
764                    stack=False,
765                )
766                continue
767
768            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
769
770        return plugins
771
772
773    def get_required_packages(self, debug: bool=False) -> List[str]:
774        """
775        Return the required package names (excluding plugins).
776        """
777        _deps = self.get_dependencies(debug=debug)
778        return [_d for _d in _deps if not _d.startswith('plugin:')]
779
780
781    def activate_venv(
782        self,
783        dependencies: bool = True,
784        init_if_not_exists: bool = True,
785        debug: bool = False,
786        **kw
787    ) -> bool:
788        """
789        Activate the virtual environments for the plugin and its dependencies.
790
791        Parameters
792        ----------
793        dependencies: bool, default True
794            If `True`, activate the virtual environments for required plugins.
795
796        Returns
797        -------
798        A bool indicating success.
799        """
800        import meerschaum.config.paths as paths
801        from meerschaum.utils.venv import venv_target_path
802        from meerschaum.utils.packages import activate_venv
803        from meerschaum.utils.misc import make_symlink, is_symlink
804
805        if dependencies:
806            for plugin in self.get_required_plugins(debug=debug):
807                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
808
809        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
810        venv_meerschaum_path = vtp / 'meerschaum'
811
812        try:
813            success, msg = True, "Success"
814            if is_symlink(venv_meerschaum_path):
815                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH:
816                    venv_meerschaum_path.unlink()
817                    success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH)
818        except Exception as e:
819            success, msg = False, str(e)
820        if not success:
821            warn(
822                f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n"
823                f"{msg}"
824            )
825
826        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)
827
828
829    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
830        """
831        Deactivate the virtual environments for the plugin and its dependencies.
832
833        Parameters
834        ----------
835        dependencies: bool, default True
836            If `True`, deactivate the virtual environments for required plugins.
837
838        Returns
839        -------
840        A bool indicating success.
841        """
842        from meerschaum.utils.packages import deactivate_venv
843        success = deactivate_venv(self.name, debug=debug, **kw)
844        if dependencies:
845            for plugin in self.get_required_plugins(debug=debug):
846                plugin.deactivate_venv(debug=debug, **kw)
847        return success
848
849
850    def install_dependencies(
851        self,
852        force: bool = False,
853        debug: bool = False,
854    ) -> bool:
855        """
856        If specified, install dependencies.
857        
858        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
859        Meerschaum plugins from the same repository as this Plugin.
860        To install from a different repository, add the repo keys after `'@'`
861        (e.g. `'plugin:foo@api:bar'`).
862
863        Parameters
864        ----------
865        force: bool, default False
866            If `True`, continue with the installation, even if some
867            required packages fail to install.
868
869        debug: bool, default False
870            Verbosity toggle.
871
872        Returns
873        -------
874        A bool indicating success.
875        """
876        from meerschaum.utils.packages import pip_install, venv_contains_package
877        from meerschaum.utils.warnings import warn, info
878        _deps = self.get_dependencies(debug=debug)
879        if not _deps and self.requirements_file_path is None:
880            return True
881
882        plugins = self.get_required_plugins(debug=debug)
883        for _plugin in plugins:
884            if _plugin.name == self.name:
885                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
886                continue
887            _success, _msg = _plugin.repo_connector.install_plugin(
888                _plugin.name, debug=debug, force=force
889            )
890            if not _success:
891                warn(
892                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
893                    + f" for plugin '{self.name}':\n" + _msg,
894                    stack = False,
895                )
896                if not force:
897                    warn(
898                        "Try installing with the `--force` flag to continue anyway.",
899                        stack = False,
900                    )
901                    return False
902                info(
903                    "Continuing with installation despite the failure "
904                    + "(careful, things might be broken!)...",
905                    icon = False
906                )
907
908
909        ### First step: parse `requirements.txt` if it exists.
910        if self.requirements_file_path is not None:
911            if not pip_install(
912                requirements_file_path=self.requirements_file_path,
913                venv=self.name, debug=debug
914            ):
915                warn(
916                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
917                    stack = False,
918                )
919                if not force:
920                    warn(
921                        "Try installing with `--force` to continue anyway.",
922                        stack = False,
923                    )
924                    return False
925                info(
926                    "Continuing with installation despite the failure "
927                    + "(careful, things might be broken!)...",
928                    icon = False
929                )
930
931
932        ### Don't reinstall packages that are already included in required plugins.
933        packages = []
934        _packages = self.get_required_packages(debug=debug)
935        accounted_for_packages = set()
936        for package_name in _packages:
937            for plugin in plugins:
938                if venv_contains_package(package_name, plugin.name):
939                    accounted_for_packages.add(package_name)
940                    break
941        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
942
943        ### Attempt pip packages installation.
944        if packages:
945            for package in packages:
946                if not pip_install(package, venv=self.name, debug=debug):
947                    warn(
948                        f"Failed to install required package '{package}'"
949                        + f" for plugin '{self.name}'.",
950                        stack = False,
951                    )
952                    if not force:
953                        warn(
954                            "Try installing with `--force` to continue anyway.",
955                            stack = False,
956                        )
957                        return False
958                    info(
959                        "Continuing with installation despite the failure "
960                        + "(careful, things might be broken!)...",
961                        icon = False
962                    )
963        return True
964
965
966    @property
967    def full_name(self) -> str:
968        """
969        Include the repo keys with the plugin's name.
970        """
971        from meerschaum._internal.static import STATIC_CONFIG
972        sep = STATIC_CONFIG['plugins']['repo_separator']
973        return self.name + sep + str(self.repo_connector)
974
975
976    def __str__(self):
977        return self.name
978
979
980    def __repr__(self):
981        return f"Plugin('{self.name}', repo='{self.repo_connector}')"
982
983
984    def __del__(self):
985        pass

Handle packaging of Meerschaum plugins.

Plugin( name: str, version: Optional[str] = None, user_id: Optional[int] = None, required: Optional[List[str]] = None, attributes: Optional[Dict[str, Any]] = None, archive_path: Optional[pathlib.Path] = None, venv_path: Optional[pathlib.Path] = None, repo_connector: Optional[meerschaum.connectors.APIConnector] = None, repo: Union[meerschaum.connectors.APIConnector, str, NoneType] = None)
33    def __init__(
34        self,
35        name: str,
36        version: Optional[str] = None,
37        user_id: Optional[int] = None,
38        required: Optional[List[str]] = None,
39        attributes: Optional[Dict[str, Any]] = None,
40        archive_path: Optional[pathlib.Path] = None,
41        venv_path: Optional[pathlib.Path] = None,
42        repo_connector: Optional['mrsm.connectors.api.APIConnector'] = None,
43        repo: Union['mrsm.connectors.api.APIConnector', str, None] = None,
44    ):
45        import meerschaum.config.paths as paths
46        from meerschaum._internal.static import STATIC_CONFIG
47        sep = STATIC_CONFIG['plugins']['repo_separator']
48        _repo = None
49        if sep in name:
50            try:
51                name, _repo = name.split(sep)
52            except Exception as e:
53                error(f"Invalid plugin name: '{name}'")
54        self._repo_in_name = _repo
55
56        if attributes is None:
57            attributes = {}
58        self.name = name
59        self.attributes = attributes
60        self.user_id = user_id
61        self._version = version
62        if required:
63            self._required = required
64        self.archive_path = (
65            archive_path if archive_path is not None
66            else paths.PLUGINS_ARCHIVES_RESOURCES_PATH / f"{self.name}.tar.gz"
67        )
68        self.venv_path = (
69            venv_path if venv_path is not None
70            else paths.VIRTENV_RESOURCES_PATH / self.name
71        )
72        self._repo_connector = repo_connector
73        self._repo_keys = repo
name
attributes
user_id
archive_path
venv_path
repo_connector
76    @property
77    def repo_connector(self):
78        """
79        Return the repository connector for this plugin.
80        NOTE: This imports the `connectors` module, which imports certain plugin modules.
81        """
82        if self._repo_connector is None:
83            from meerschaum.connectors.parse import parse_repo_keys
84
85            repo_keys = self._repo_keys or self._repo_in_name
86            if self._repo_in_name and self._repo_keys and self._repo_keys != self._repo_in_name:
87                error(
88                    f"Received inconsistent repos: '{self._repo_in_name}' and '{self._repo_keys}'."
89                )
90            repo_connector = parse_repo_keys(repo_keys)
91            self._repo_connector = repo_connector
92        return self._repo_connector

Return the repository connector for this plugin. NOTE: This imports the connectors module, which imports certain plugin modules.

version
 95    @property
 96    def version(self):
 97        """
 98        Return the plugin's module version is defined (`__version__`) if it's defined.
 99        """
100        if self._version is None:
101            try:
102                self._version = self.module.__version__
103            except Exception as e:
104                self._version = None
105        return self._version

Return the plugin's module version is defined (__version__) if it's defined.

module
108    @property
109    def module(self):
110        """
111        Return the Python module of the underlying plugin.
112        """
113        if '_module' not in self.__dict__ or self.__dict__.get('_module', None) is None:
114            if self.__file__ is None:
115                return None
116
117            from meerschaum.plugins import import_plugins
118            self._module = import_plugins(str(self), warn=False)
119
120        return self._module

Return the Python module of the underlying plugin.

requirements_file_path: Optional[pathlib.Path]
148    @property
149    def requirements_file_path(self) -> Union[pathlib.Path, None]:
150        """
151        If a file named `requirements.txt` exists, return its path.
152        """
153        if self.__file__ is None:
154            return None
155        path = pathlib.Path(self.__file__).parent / 'requirements.txt'
156        if not path.exists():
157            return None
158        return path

If a file named requirements.txt exists, return its path.

def is_installed(self, **kw) -> bool:
161    def is_installed(self, **kw) -> bool:
162        """
163        Check whether a plugin is correctly installed.
164
165        Returns
166        -------
167        A `bool` indicating whether a plugin exists and is successfully imported.
168        """
169        return self.__file__ is not None

Check whether a plugin is correctly installed.

Returns
  • A bool indicating whether a plugin exists and is successfully imported.
def make_tar(self, debug: bool = False) -> pathlib.Path:
172    def make_tar(self, debug: bool = False) -> pathlib.Path:
173        """
174        Compress the plugin's source files into a `.tar.gz` archive and return the archive's path.
175
176        Parameters
177        ----------
178        debug: bool, default False
179            Verbosity toggle.
180
181        Returns
182        -------
183        A `pathlib.Path` to the archive file's path.
184
185        """
186        import tarfile, pathlib, subprocess, fnmatch
187        from meerschaum.utils.debug import dprint
188        from meerschaum.utils.packages import attempt_import
189        pathspec = attempt_import('pathspec', debug=debug)
190
191        if not self.__file__:
192            from meerschaum.utils.warnings import error
193            error(f"Could not find file for plugin '{self}'.")
194        if '__init__.py' in self.__file__ or os.path.isdir(self.__file__):
195            path = self.__file__.replace('__init__.py', '')
196            is_dir = True
197        else:
198            path = self.__file__
199            is_dir = False
200
201        old_cwd = os.getcwd()
202        real_parent_path = pathlib.Path(os.path.realpath(path)).parent
203        os.chdir(real_parent_path)
204
205        default_patterns_to_ignore = [
206            '.pyc',
207            '__pycache__/',
208            'eggs/',
209            '__pypackages__/',
210            '.git',
211        ]
212
213        def parse_gitignore() -> 'Set[str]':
214            gitignore_path = pathlib.Path(path) / '.gitignore'
215            if not gitignore_path.exists():
216                return set(default_patterns_to_ignore)
217            with open(gitignore_path, 'r', encoding='utf-8') as f:
218                gitignore_text = f.read()
219            return set(pathspec.PathSpec.from_lines(
220                pathspec.patterns.GitWildMatchPattern,
221                default_patterns_to_ignore + gitignore_text.splitlines()
222            ).match_tree(path))
223
224        patterns_to_ignore = parse_gitignore() if is_dir else set()
225
226        if debug:
227            dprint(f"Patterns to ignore:\n{patterns_to_ignore}")
228
229        with tarfile.open(self.archive_path, 'w:gz') as tarf:
230            if not is_dir:
231                tarf.add(f"{self.name}.py")
232            else:
233                for root, dirs, files in os.walk(self.name):
234                    for f in files:
235                        good_file = True
236                        fp = os.path.join(root, f)
237                        for pattern in patterns_to_ignore:
238                            if pattern in str(fp) or f.startswith('.'):
239                                good_file = False
240                                break
241                        if good_file:
242                            if debug:
243                                dprint(f"Adding '{fp}'...")
244                            tarf.add(fp)
245
246        ### clean up and change back to old directory
247        os.chdir(old_cwd)
248
249        ### change to 775 to avoid permissions issues with the API in a Docker container
250        self.archive_path.chmod(0o775)
251
252        if debug:
253            dprint(f"Created archive '{self.archive_path}'.")
254        return self.archive_path

Compress the plugin's source files into a .tar.gz archive and return the archive's path.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A pathlib.Path to the archive file's path.
def install( self, skip_deps: bool = False, force: bool = False, debug: bool = False) -> Tuple[bool, str]:
257    def install(
258        self,
259        skip_deps: bool = False,
260        force: bool = False,
261        debug: bool = False,
262    ) -> SuccessTuple:
263        """
264        Extract a plugin's tar archive to the plugins directory.
265        
266        This function checks if the plugin is already installed and if the version is equal or
267        greater than the existing installation.
268
269        Parameters
270        ----------
271        skip_deps: bool, default False
272            If `True`, do not install dependencies.
273
274        force: bool, default False
275            If `True`, continue with installation, even if required packages fail to install.
276
277        debug: bool, default False
278            Verbosity toggle.
279
280        Returns
281        -------
282        A `SuccessTuple` of success (bool) and a message (str).
283
284        """
285        if self.full_name in _ongoing_installations:
286            return True, f"Already installing plugin '{self}'."
287        _ongoing_installations.add(self.full_name)
288
289        import meerschaum.config.paths as paths
290        from meerschaum.utils.warnings import warn, error
291        if debug:
292            from meerschaum.utils.debug import dprint
293        import tarfile
294        import re
295        import ast
296        from meerschaum.plugins import sync_plugins_symlinks
297        from meerschaum.utils.packages import attempt_import, reload_meerschaum
298        from meerschaum.utils.venv import init_venv
299        from meerschaum.utils.misc import safely_extract_tar
300        old_cwd = os.getcwd()
301        old_version = ''
302        new_version = ''
303        temp_dir = paths.PLUGINS_TEMP_RESOURCES_PATH / self.name
304        temp_dir.mkdir(exist_ok=True)
305
306        if not self.archive_path.exists():
307            return False, f"Missing archive file for plugin '{self}'."
308        if self.version is not None:
309            old_version = self.version
310            if debug:
311                dprint(f"Found existing version '{old_version}' for plugin '{self}'.")
312
313        if debug:
314            dprint(f"Extracting '{self.archive_path}' to '{temp_dir}'...")
315
316        try:
317            with tarfile.open(self.archive_path, 'r:gz') as tarf:
318                safely_extract_tar(tarf, temp_dir)
319        except Exception as e:
320            warn(e)
321            return False, f"Failed to extract plugin '{self.name}'."
322
323        ### search for version information
324        files = os.listdir(temp_dir)
325        
326        if str(files[0]) == self.name:
327            is_dir = True
328        elif str(files[0]) == self.name + '.py':
329            is_dir = False
330        else:
331            error(f"Unknown format encountered for plugin '{self}'.")
332
333        fpath = temp_dir / files[0]
334        if is_dir:
335            fpath = fpath / '__init__.py'
336
337        init_venv(self.name, debug=debug)
338        with open(fpath, 'r', encoding='utf-8') as f:
339            init_lines = f.readlines()
340        new_version = None
341        for line in init_lines:
342            if '__version__' not in line:
343                continue
344            version_match = re.search(r'__version__(\s?)=', line.lstrip().rstrip())
345            if not version_match:
346                continue
347            new_version = ast.literal_eval(line.split('=')[1].lstrip().rstrip())
348            break
349        if not new_version:
350            warn(
351                f"No `__version__` defined for plugin '{self}'. "
352                + "Assuming new version...",
353                stack = False,
354            )
355
356        packaging_version = attempt_import('packaging.version')
357        try:
358            is_new_version = (not new_version and not old_version) or (
359                packaging_version.parse(old_version) < packaging_version.parse(new_version)
360            )
361            is_same_version = new_version and old_version and (
362                packaging_version.parse(old_version) == packaging_version.parse(new_version)
363            )
364        except Exception:
365            is_new_version, is_same_version = True, False
366
367        ### Determine where to permanently store the new plugin.
368        plugin_installation_dir_path = paths.PLUGINS_DIR_PATHS[0]
369        for path in paths.PLUGINS_DIR_PATHS:
370            if not path.exists():
371                warn(f"Plugins path does not exist: {path}", stack=False)
372                continue
373
374            files_in_plugins_dir = os.listdir(path)
375            if (
376                self.name in files_in_plugins_dir
377                or
378                (self.name + '.py') in files_in_plugins_dir
379            ):
380                plugin_installation_dir_path = path
381                break
382
383        success_msg = (
384            f"Successfully installed plugin '{self}'"
385            + ("\n    (skipped dependencies)" if skip_deps else "")
386            + "."
387        )
388        success, abort = None, None
389
390        if is_same_version and not force:
391            success, msg = True, (
392                f"Plugin '{self}' is up-to-date (version {old_version}).\n" +
393                "    Install again with `-f` or `--force` to reinstall."
394            )
395            abort = True
396        elif is_new_version or force:
397            for src_dir, dirs, files in os.walk(temp_dir):
398                if success is not None:
399                    break
400                dst_dir = str(src_dir).replace(str(temp_dir), str(plugin_installation_dir_path))
401                if not os.path.exists(dst_dir):
402                    os.mkdir(dst_dir)
403                for f in files:
404                    src_file = os.path.join(src_dir, f)
405                    dst_file = os.path.join(dst_dir, f)
406                    if os.path.exists(dst_file):
407                        os.remove(dst_file)
408
409                    if debug:
410                        dprint(f"Moving '{src_file}' to '{dst_dir}'...")
411                    try:
412                        shutil.move(src_file, dst_dir)
413                    except Exception:
414                        success, msg = False, (
415                            f"Failed to install plugin '{self}': " +
416                            f"Could not move file '{src_file}' to '{dst_dir}'"
417                        )
418                        print(msg)
419                        break
420            if success is None:
421                success, msg = True, success_msg
422        else:
423            success, msg = False, (
424                f"Your installed version of plugin '{self}' ({old_version}) is higher than "
425                + f"attempted version {new_version}."
426            )
427
428        shutil.rmtree(temp_dir)
429        os.chdir(old_cwd)
430
431        ### Reload the plugin's module.
432        sync_plugins_symlinks(debug=debug)
433        if '_module' in self.__dict__:
434            del self.__dict__['_module']
435        init_venv(venv=self.name, force=True, debug=debug)
436        reload_meerschaum(debug=debug)
437
438        ### if we've already failed, return here
439        if not success or abort:
440            _ongoing_installations.remove(self.full_name)
441            return success, msg
442
443        ### attempt to install dependencies
444        dependencies_installed = skip_deps or self.install_dependencies(force=force, debug=debug)
445        if not dependencies_installed:
446            _ongoing_installations.remove(self.full_name)
447            return False, f"Failed to install dependencies for plugin '{self}'."
448
449        ### handling success tuple, bool, or other (typically None)
450        setup_tuple = self.setup(debug=debug)
451        if isinstance(setup_tuple, tuple):
452            if not setup_tuple[0]:
453                success, msg = setup_tuple
454        elif isinstance(setup_tuple, bool):
455            if not setup_tuple:
456                success, msg = False, (
457                    f"Failed to run post-install setup for plugin '{self}'." + '\n' +
458                    f"Check `setup()` in '{self.__file__}' for more information " +
459                    "(no error message provided)."
460                )
461            else:
462                success, msg = True, success_msg
463        elif setup_tuple is None:
464            success = True
465            msg = (
466                f"Post-install for plugin '{self}' returned None. " +
467                "Assuming plugin successfully installed."
468            )
469            warn(msg)
470        else:
471            success = False
472            msg = (
473                f"Post-install for plugin '{self}' returned unexpected value " +
474                f"of type '{type(setup_tuple)}': {setup_tuple}"
475            )
476
477        _ongoing_installations.remove(self.full_name)
478        _ = self.module
479        return success, msg

Extract a plugin's tar archive to the plugins directory.

This function checks if the plugin is already installed and if the version is equal or greater than the existing installation.

Parameters
  • skip_deps (bool, default False): If True, do not install dependencies.
  • force (bool, default False): If True, continue with installation, even if required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
def remove_archive(self, debug: bool = False) -> Tuple[bool, str]:
482    def remove_archive(
483        self,        
484        debug: bool = False
485    ) -> SuccessTuple:
486        """Remove a plugin's archive file."""
487        if not self.archive_path.exists():
488            return True, f"Archive file for plugin '{self}' does not exist."
489        try:
490            self.archive_path.unlink()
491        except Exception as e:
492            return False, f"Failed to remove archive for plugin '{self}':\n{e}"
493        return True, "Success"

Remove a plugin's archive file.

def remove_venv(self, debug: bool = False) -> Tuple[bool, str]:
496    def remove_venv(
497        self,        
498        debug: bool = False
499    ) -> SuccessTuple:
500        """Remove a plugin's virtual environment."""
501        if not self.venv_path.exists():
502            return True, f"Virtual environment for plugin '{self}' does not exist."
503        try:
504            shutil.rmtree(self.venv_path)
505        except Exception as e:
506            return False, f"Failed to remove virtual environment for plugin '{self}':\n{e}"
507        return True, "Success"

Remove a plugin's virtual environment.

def uninstall(self, debug: bool = False) -> Tuple[bool, str]:
510    def uninstall(self, debug: bool = False) -> SuccessTuple:
511        """
512        Remove a plugin, its virtual environment, and archive file.
513        """
514        from meerschaum.utils.packages import reload_meerschaum
515        from meerschaum.plugins import sync_plugins_symlinks
516        from meerschaum.utils.warnings import warn, info
517        warnings_thrown_count: int = 0
518        max_warnings: int = 3
519
520        if not self.is_installed():
521            info(
522                f"Plugin '{self.name}' doesn't seem to be installed.\n    "
523                + "Checking for artifacts...",
524                stack = False,
525            )
526        else:
527            real_path = pathlib.Path(os.path.realpath(self.__file__))
528            try:
529                if real_path.name == '__init__.py':
530                    shutil.rmtree(real_path.parent)
531                else:
532                    real_path.unlink()
533            except Exception as e:
534                warn(f"Could not remove source files for plugin '{self.name}':\n{e}", stack=False)
535                warnings_thrown_count += 1
536            else:
537                info(f"Removed source files for plugin '{self.name}'.")
538
539        if self.venv_path.exists():
540            success, msg = self.remove_venv(debug=debug)
541            if not success:
542                warn(msg, stack=False)
543                warnings_thrown_count += 1
544            else:
545                info(f"Removed virtual environment from plugin '{self.name}'.")
546
547        success = warnings_thrown_count < max_warnings
548        sync_plugins_symlinks(debug=debug)
549        self.deactivate_venv(force=True, debug=debug)
550        reload_meerschaum(debug=debug)
551        return success, (
552            f"Successfully uninstalled plugin '{self}'." if success
553            else f"Failed to uninstall plugin '{self}'."
554        )

Remove a plugin, its virtual environment, and archive file.

def setup( self, *args: str, debug: bool = False, **kw: Any) -> Union[Tuple[bool, str], bool]:
557    def setup(self, *args: str, debug: bool = False, **kw: Any) -> Union[SuccessTuple, bool]:
558        """
559        If exists, run the plugin's `setup()` function.
560
561        Parameters
562        ----------
563        *args: str
564            The positional arguments passed to the `setup()` function.
565            
566        debug: bool, default False
567            Verbosity toggle.
568
569        **kw: Any
570            The keyword arguments passed to the `setup()` function.
571
572        Returns
573        -------
574        A `SuccessTuple` or `bool` indicating success.
575
576        """
577        from meerschaum.utils.debug import dprint
578        import inspect
579        _setup = None
580        for name, fp in inspect.getmembers(self.module):
581            if name == 'setup' and inspect.isfunction(fp):
582                _setup = fp
583                break
584
585        ### assume success if no setup() is found (not necessary)
586        if _setup is None:
587            return True
588
589        sig = inspect.signature(_setup)
590        has_debug, has_kw = ('debug' in sig.parameters), False
591        for k, v in sig.parameters.items():
592            if '**' in str(v):
593                has_kw = True
594                break
595
596        _kw = {}
597        if has_kw:
598            _kw.update(kw)
599        if has_debug:
600            _kw['debug'] = debug
601
602        if debug:
603            dprint(f"Running setup for plugin '{self}'...")
604        try:
605            self.activate_venv(debug=debug)
606            return_tuple = _setup(*args, **_kw)
607            self.deactivate_venv(debug=debug)
608        except Exception as e:
609            return False, str(e)
610
611        if isinstance(return_tuple, tuple):
612            return return_tuple
613        if isinstance(return_tuple, bool):
614            return return_tuple, f"Setup for Plugin '{self.name}' did not return a message."
615        if return_tuple is None:
616            return False, f"Setup for Plugin '{self.name}' returned None."
617        return False, f"Unknown return value from setup for Plugin '{self.name}': {return_tuple}"

If exists, run the plugin's setup() function.

Parameters
  • *args (str): The positional arguments passed to the setup() function.
  • debug (bool, default False): Verbosity toggle.
  • **kw (Any): The keyword arguments passed to the setup() function.
Returns
def get_dependencies(self, debug: bool = False) -> List[str]:
620    def get_dependencies(
621        self,
622        debug: bool = False,
623    ) -> List[str]:
624        """
625        If the Plugin has specified dependencies in a list called `required`, return the list.
626        
627        **NOTE:** Dependecies which start with `'plugin:'` are Meerschaum plugins, not pip packages.
628        Meerschaum plugins may also specify connector keys for a repo after `'@'`.
629
630        Parameters
631        ----------
632        debug: bool, default False
633            Verbosity toggle.
634
635        Returns
636        -------
637        A list of required packages and plugins (str).
638
639        """
640        if '_required' in self.__dict__:
641            return self._required
642
643        ### If the plugin has not yet been imported,
644        ### infer the dependencies from the source text.
645        ### This is not super robust, and it doesn't feel right
646        ### having multiple versions of the logic.
647        ### This is necessary when determining the activation order
648        ### without having import the module.
649        ### For consistency's sake, the module-less method does not cache the requirements.
650        if self.__dict__.get('_module', None) is None:
651            file_path = self.__file__
652            if file_path is None:
653                return []
654            with open(file_path, 'r', encoding='utf-8') as f:
655                text = f.read()
656
657            if 'required' not in text:
658                return []
659
660            ### This has some limitations:
661            ### It relies on `required` being manually declared.
662            ### We lose the ability to dynamically alter the `required` list,
663            ### which is why we've kept the module-reliant method below.
664            import ast, re
665            ### NOTE: This technically would break 
666            ### if `required` was the very first line of the file.
667            req_start_match = re.search(r'\nrequired(:\s*)?.*=', text)
668            if not req_start_match:
669                return []
670            req_start = req_start_match.start()
671            equals_sign = req_start + text[req_start:].find('=')
672
673            ### Dependencies may have brackets within the strings, so push back the index.
674            first_opening_brace = equals_sign + 1 + text[equals_sign:].find('[')
675            if first_opening_brace == -1:
676                return []
677
678            next_closing_brace = equals_sign + 1 + text[equals_sign:].find(']')
679            if next_closing_brace == -1:
680                return []
681
682            start_ix = first_opening_brace + 1
683            end_ix = next_closing_brace
684
685            num_braces = 0
686            while True:
687                if '[' not in text[start_ix:end_ix]:
688                    break
689                num_braces += 1
690                start_ix = end_ix
691                end_ix += text[end_ix + 1:].find(']') + 1
692
693            req_end = end_ix + 1
694            req_text = (
695                text[(first_opening_brace-1):req_end]
696                .lstrip()
697                .replace('=', '', 1)
698                .lstrip()
699                .rstrip()
700            )
701            try:
702                required = ast.literal_eval(req_text)
703            except Exception as e:
704                warn(
705                    f"Unable to determine requirements for plugin '{self.name}' "
706                    + "without importing the module.\n"
707                    + "    This may be due to dynamically setting the global `required` list.\n"
708                    + f"    {e}"
709                )
710                return []
711            return required
712
713        import inspect
714        self.activate_venv(dependencies=False, debug=debug)
715        required = []
716        for name, val in inspect.getmembers(self.module):
717            if name == 'required':
718                required = val
719                break
720        self._required = required
721        self.deactivate_venv(dependencies=False, debug=debug)
722        return required

If the Plugin has specified dependencies in a list called required, return the list.

NOTE: Dependecies which start with 'plugin:' are Meerschaum plugins, not pip packages. Meerschaum plugins may also specify connector keys for a repo after '@'.

Parameters
  • debug (bool, default False): Verbosity toggle.
Returns
  • A list of required packages and plugins (str).
def get_required_plugins(self, debug: bool = False) -> List[Plugin]:
725    def get_required_plugins(self, debug: bool=False) -> List[mrsm.plugins.Plugin]:
726        """
727        Return a list of required Plugin objects.
728        """
729        from meerschaum.utils.warnings import warn
730        from meerschaum.config import get_config
731        from meerschaum._internal.static import STATIC_CONFIG
732        from meerschaum.connectors.parse import is_valid_connector_keys
733        plugins = []
734        _deps = self.get_dependencies(debug=debug)
735        sep = STATIC_CONFIG['plugins']['repo_separator']
736        plugin_names = [
737            _d[len('plugin:'):] for _d in _deps
738            if _d.startswith('plugin:') and len(_d) > len('plugin:')
739        ]
740        default_repo_keys = get_config('meerschaum', 'repository')
741        skipped_repo_keys = set()
742
743        for _plugin_name in plugin_names:
744            if sep in _plugin_name:
745                try:
746                    _plugin_name, _repo_keys = _plugin_name.split(sep)
747                except Exception:
748                    _repo_keys = default_repo_keys
749                    warn(
750                        f"Invalid repo keys for required plugin '{_plugin_name}'.\n    "
751                        + f"Will try to use '{_repo_keys}' instead.",
752                        stack = False,
753                    )
754            else:
755                _repo_keys = default_repo_keys
756
757            if _repo_keys in skipped_repo_keys:
758                continue
759
760            if not is_valid_connector_keys(_repo_keys):
761                warn(
762                    f"Invalid connector '{_repo_keys}'.\n"
763                    f"    Skipping required plugins from repository '{_repo_keys}'",
764                    stack=False,
765                )
766                continue
767
768            plugins.append(Plugin(_plugin_name, repo=_repo_keys))
769
770        return plugins

Return a list of required Plugin objects.

def get_required_packages(self, debug: bool = False) -> List[str]:
773    def get_required_packages(self, debug: bool=False) -> List[str]:
774        """
775        Return the required package names (excluding plugins).
776        """
777        _deps = self.get_dependencies(debug=debug)
778        return [_d for _d in _deps if not _d.startswith('plugin:')]

Return the required package names (excluding plugins).

def activate_venv( self, dependencies: bool = True, init_if_not_exists: bool = True, debug: bool = False, **kw) -> bool:
781    def activate_venv(
782        self,
783        dependencies: bool = True,
784        init_if_not_exists: bool = True,
785        debug: bool = False,
786        **kw
787    ) -> bool:
788        """
789        Activate the virtual environments for the plugin and its dependencies.
790
791        Parameters
792        ----------
793        dependencies: bool, default True
794            If `True`, activate the virtual environments for required plugins.
795
796        Returns
797        -------
798        A bool indicating success.
799        """
800        import meerschaum.config.paths as paths
801        from meerschaum.utils.venv import venv_target_path
802        from meerschaum.utils.packages import activate_venv
803        from meerschaum.utils.misc import make_symlink, is_symlink
804
805        if dependencies:
806            for plugin in self.get_required_plugins(debug=debug):
807                plugin.activate_venv(debug=debug, init_if_not_exists=init_if_not_exists, **kw)
808
809        vtp = venv_target_path(self.name, debug=debug, allow_nonexistent=True)
810        venv_meerschaum_path = vtp / 'meerschaum'
811
812        try:
813            success, msg = True, "Success"
814            if is_symlink(venv_meerschaum_path):
815                if pathlib.Path(os.path.realpath(venv_meerschaum_path)) != paths.PACKAGE_ROOT_PATH:
816                    venv_meerschaum_path.unlink()
817                    success, msg = make_symlink(venv_meerschaum_path, paths.PACKAGE_ROOT_PATH)
818        except Exception as e:
819            success, msg = False, str(e)
820        if not success:
821            warn(
822                f"Unable to create symlink {venv_meerschaum_path} to {paths.PACKAGE_ROOT_PATH}:\n"
823                f"{msg}"
824            )
825
826        return activate_venv(self.name, init_if_not_exists=init_if_not_exists, debug=debug, **kw)

Activate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, activate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def deactivate_venv(self, dependencies: bool = True, debug: bool = False, **kw) -> bool:
829    def deactivate_venv(self, dependencies: bool=True, debug: bool = False, **kw) -> bool:
830        """
831        Deactivate the virtual environments for the plugin and its dependencies.
832
833        Parameters
834        ----------
835        dependencies: bool, default True
836            If `True`, deactivate the virtual environments for required plugins.
837
838        Returns
839        -------
840        A bool indicating success.
841        """
842        from meerschaum.utils.packages import deactivate_venv
843        success = deactivate_venv(self.name, debug=debug, **kw)
844        if dependencies:
845            for plugin in self.get_required_plugins(debug=debug):
846                plugin.deactivate_venv(debug=debug, **kw)
847        return success

Deactivate the virtual environments for the plugin and its dependencies.

Parameters
  • dependencies (bool, default True): If True, deactivate the virtual environments for required plugins.
Returns
  • A bool indicating success.
def install_dependencies(self, force: bool = False, debug: bool = False) -> bool:
850    def install_dependencies(
851        self,
852        force: bool = False,
853        debug: bool = False,
854    ) -> bool:
855        """
856        If specified, install dependencies.
857        
858        **NOTE:** Dependencies that start with `'plugin:'` will be installed as
859        Meerschaum plugins from the same repository as this Plugin.
860        To install from a different repository, add the repo keys after `'@'`
861        (e.g. `'plugin:foo@api:bar'`).
862
863        Parameters
864        ----------
865        force: bool, default False
866            If `True`, continue with the installation, even if some
867            required packages fail to install.
868
869        debug: bool, default False
870            Verbosity toggle.
871
872        Returns
873        -------
874        A bool indicating success.
875        """
876        from meerschaum.utils.packages import pip_install, venv_contains_package
877        from meerschaum.utils.warnings import warn, info
878        _deps = self.get_dependencies(debug=debug)
879        if not _deps and self.requirements_file_path is None:
880            return True
881
882        plugins = self.get_required_plugins(debug=debug)
883        for _plugin in plugins:
884            if _plugin.name == self.name:
885                warn(f"Plugin '{self.name}' cannot depend on itself! Skipping...", stack=False)
886                continue
887            _success, _msg = _plugin.repo_connector.install_plugin(
888                _plugin.name, debug=debug, force=force
889            )
890            if not _success:
891                warn(
892                    f"Failed to install required plugin '{_plugin}' from '{_plugin.repo_connector}'"
893                    + f" for plugin '{self.name}':\n" + _msg,
894                    stack = False,
895                )
896                if not force:
897                    warn(
898                        "Try installing with the `--force` flag to continue anyway.",
899                        stack = False,
900                    )
901                    return False
902                info(
903                    "Continuing with installation despite the failure "
904                    + "(careful, things might be broken!)...",
905                    icon = False
906                )
907
908
909        ### First step: parse `requirements.txt` if it exists.
910        if self.requirements_file_path is not None:
911            if not pip_install(
912                requirements_file_path=self.requirements_file_path,
913                venv=self.name, debug=debug
914            ):
915                warn(
916                    f"Failed to resolve 'requirements.txt' for plugin '{self.name}'.",
917                    stack = False,
918                )
919                if not force:
920                    warn(
921                        "Try installing with `--force` to continue anyway.",
922                        stack = False,
923                    )
924                    return False
925                info(
926                    "Continuing with installation despite the failure "
927                    + "(careful, things might be broken!)...",
928                    icon = False
929                )
930
931
932        ### Don't reinstall packages that are already included in required plugins.
933        packages = []
934        _packages = self.get_required_packages(debug=debug)
935        accounted_for_packages = set()
936        for package_name in _packages:
937            for plugin in plugins:
938                if venv_contains_package(package_name, plugin.name):
939                    accounted_for_packages.add(package_name)
940                    break
941        packages = [pkg for pkg in _packages if pkg not in accounted_for_packages]
942
943        ### Attempt pip packages installation.
944        if packages:
945            for package in packages:
946                if not pip_install(package, venv=self.name, debug=debug):
947                    warn(
948                        f"Failed to install required package '{package}'"
949                        + f" for plugin '{self.name}'.",
950                        stack = False,
951                    )
952                    if not force:
953                        warn(
954                            "Try installing with `--force` to continue anyway.",
955                            stack = False,
956                        )
957                        return False
958                    info(
959                        "Continuing with installation despite the failure "
960                        + "(careful, things might be broken!)...",
961                        icon = False
962                    )
963        return True

If specified, install dependencies.

NOTE: Dependencies that start with 'plugin:' will be installed as Meerschaum plugins from the same repository as this Plugin. To install from a different repository, add the repo keys after '@' (e.g. 'plugin:foo@api:bar').

Parameters
  • force (bool, default False): If True, continue with the installation, even if some required packages fail to install.
  • debug (bool, default False): Verbosity toggle.
Returns
  • A bool indicating success.
full_name: str
966    @property
967    def full_name(self) -> str:
968        """
969        Include the repo keys with the plugin's name.
970        """
971        from meerschaum._internal.static import STATIC_CONFIG
972        sep = STATIC_CONFIG['plugins']['repo_separator']
973        return self.name + sep + str(self.repo_connector)

Include the repo keys with the plugin's name.

SuccessTuple = typing.Tuple[bool, str]
class Venv:
 19class Venv:
 20    """
 21    Manage a virtual enviroment's activation status.
 22
 23    Examples
 24    --------
 25    >>> from meerschaum.plugins import Plugin
 26    >>> with Venv('mrsm') as venv:
 27    ...     import pandas
 28    >>> with Venv(Plugin('noaa')) as venv:
 29    ...     import requests
 30    >>> venv = Venv('mrsm')
 31    >>> venv.activate()
 32    True
 33    >>> venv.deactivate()
 34    True
 35    >>> 
 36    """
 37
 38    def __init__(
 39        self,
 40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
 41        init_if_not_exists: bool = True,
 42        debug: bool = False,
 43    ) -> None:
 44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
 45        ### For some weird threading issue,
 46        ### we can't use `isinstance` here.
 47        if '_Plugin' in str(type(venv)):
 48            self._venv = venv.name
 49            self._activate = venv.activate_venv
 50            self._deactivate = venv.deactivate_venv
 51            self._kwargs = {}
 52        else:
 53            self._venv = venv
 54            self._activate = activate_venv
 55            self._deactivate = deactivate_venv
 56            self._kwargs = {'venv': venv}
 57        self._debug = debug
 58        self._init_if_not_exists = init_if_not_exists
 59        ### In case someone calls `deactivate()` before `activate()`.
 60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 61
 62
 63    def activate(self, debug: bool = False) -> bool:
 64        """
 65        Activate this virtual environment.
 66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 67        will also be activated.
 68        """
 69        from meerschaum.utils.venv import active_venvs, init_venv
 70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
 71        try:
 72            return self._activate(
 73                debug=(debug or self._debug),
 74                init_if_not_exists=self._init_if_not_exists,
 75                **self._kwargs
 76            )
 77        except OSError as e:
 78            if self._init_if_not_exists:
 79                if not init_venv(self._venv, force=True):
 80                    raise e
 81        return self._activate(
 82            debug=(debug or self._debug),
 83            init_if_not_exists=self._init_if_not_exists,
 84            **self._kwargs
 85        )
 86
 87
 88    def deactivate(self, debug: bool = False) -> bool:
 89        """
 90        Deactivate this virtual environment.
 91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
 92        will also be deactivated.
 93        """
 94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)
 95
 96
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)
106
107
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        import meerschaum.config.paths as paths
114        if self._venv is None:
115            return self.target_path.parent
116        return paths.VIRTENV_RESOURCES_PATH / self._venv
117
118
119    def __enter__(self) -> None:
120        self.activate(debug=self._debug)
121
122
123    def __exit__(self, exc_type, exc_value, exc_traceback) -> None:
124        self.deactivate(debug=self._debug)
125
126
127    def __str__(self) -> str:
128        quote = "'" if self._venv is not None else ""
129        return "Venv(" + quote + str(self._venv) + quote + ")"
130
131
132    def __repr__(self) -> str:
133        return self.__str__()

Manage a virtual enviroment's activation status.

Examples
>>> from meerschaum.plugins import Plugin
>>> with Venv('mrsm') as venv:
...     import pandas
>>> with Venv(Plugin('noaa')) as venv:
...     import requests
>>> venv = Venv('mrsm')
>>> venv.activate()
True
>>> venv.deactivate()
True
>>>
Venv( venv: Union[str, Plugin, NoneType] = 'mrsm', init_if_not_exists: bool = True, debug: bool = False)
38    def __init__(
39        self,
40        venv: Union[str, 'mrsm.core.Plugin', None] = 'mrsm',
41        init_if_not_exists: bool = True,
42        debug: bool = False,
43    ) -> None:
44        from meerschaum.utils.venv import activate_venv, deactivate_venv, active_venvs
45        ### For some weird threading issue,
46        ### we can't use `isinstance` here.
47        if '_Plugin' in str(type(venv)):
48            self._venv = venv.name
49            self._activate = venv.activate_venv
50            self._deactivate = venv.deactivate_venv
51            self._kwargs = {}
52        else:
53            self._venv = venv
54            self._activate = activate_venv
55            self._deactivate = deactivate_venv
56            self._kwargs = {'venv': venv}
57        self._debug = debug
58        self._init_if_not_exists = init_if_not_exists
59        ### In case someone calls `deactivate()` before `activate()`.
60        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
def activate(self, debug: bool = False) -> bool:
63    def activate(self, debug: bool = False) -> bool:
64        """
65        Activate this virtual environment.
66        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
67        will also be activated.
68        """
69        from meerschaum.utils.venv import active_venvs, init_venv
70        self._kwargs['previously_active_venvs'] = copy.deepcopy(active_venvs)
71        try:
72            return self._activate(
73                debug=(debug or self._debug),
74                init_if_not_exists=self._init_if_not_exists,
75                **self._kwargs
76            )
77        except OSError as e:
78            if self._init_if_not_exists:
79                if not init_venv(self._venv, force=True):
80                    raise e
81        return self._activate(
82            debug=(debug or self._debug),
83            init_if_not_exists=self._init_if_not_exists,
84            **self._kwargs
85        )

Activate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be activated.

def deactivate(self, debug: bool = False) -> bool:
88    def deactivate(self, debug: bool = False) -> bool:
89        """
90        Deactivate this virtual environment.
91        If a `meerschaum.plugins.Plugin` was provided, its dependent virtual environments
92        will also be deactivated.
93        """
94        return self._deactivate(debug=(debug or self._debug), **self._kwargs)

Deactivate this virtual environment. If a meerschaum.plugins.Plugin was provided, its dependent virtual environments will also be deactivated.

target_path: pathlib.Path
 97    @property
 98    def target_path(self) -> pathlib.Path:
 99        """
100        Return the target site-packages path for this virtual environment.
101        A `meerschaum.utils.venv.Venv` may have one virtual environment per minor Python version
102        (e.g. Python 3.10 and Python 3.7).
103        """
104        from meerschaum.utils.venv import venv_target_path
105        return venv_target_path(venv=self._venv, allow_nonexistent=True, debug=self._debug)

Return the target site-packages path for this virtual environment. A meerschaum.utils.venv.Venv may have one virtual environment per minor Python version (e.g. Python 3.10 and Python 3.7).

root_path: pathlib.Path
108    @property
109    def root_path(self) -> pathlib.Path:
110        """
111        Return the top-level path for this virtual environment.
112        """
113        import meerschaum.config.paths as paths
114        if self._venv is None:
115            return self.target_path.parent
116        return paths.VIRTENV_RESOURCES_PATH / self._venv

Return the top-level path for this virtual environment.

class Job:
  70class Job:
  71    """
  72    Manage a `meerschaum.utils.daemon.Daemon`, locally or remotely via the API.
  73    """
  74
  75    def __init__(
  76        self,
  77        name: str,
  78        sysargs: Union[List[str], str, None] = None,
  79        env: Optional[Dict[str, str]] = None,
  80        executor_keys: Optional[str] = None,
  81        delete_after_completion: bool = False,
  82        refresh_seconds: Union[int, float, None] = None,
  83        _properties: Optional[Dict[str, Any]] = None,
  84        _rotating_log=None,
  85        _stdin_file=None,
  86        _status_hook: Optional[Callable[[], str]] = None,
  87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
  88        _externally_managed: bool = False,
  89    ):
  90        """
  91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
  92
  93        Parameters
  94        ----------
  95        name: str
  96            The name of the job to be created.
  97            This will also be used as the Daemon ID.
  98
  99        sysargs: Union[List[str], str, None], default None
 100            The sysargs of the command to be executed, e.g. 'start api'.
 101
 102        env: Optional[Dict[str, str]], default None
 103            If provided, set these environment variables in the job's process.
 104
 105        executor_keys: Optional[str], default None
 106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
 107
 108        delete_after_completion: bool, default False
 109            If `True`, delete this job when it has finished executing.
 110
 111        refresh_seconds: Union[int, float, None], default None
 112            The number of seconds to sleep between refreshes.
 113            Defaults to the configured value `system.cli.refresh_seconds`.
 114
 115        _properties: Optional[Dict[str, Any]], default None
 116            If provided, use this to patch the daemon's properties.
 117        """
 118        from meerschaum.utils.daemon import Daemon
 119        for char in BANNED_CHARS:
 120            if char in name:
 121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
 122
 123        if isinstance(sysargs, str):
 124            sysargs = shlex.split(sysargs)
 125
 126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
 127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
 128        if sysargs:
 129            sysargs = [
 130                (arg if arg != escaped_and_key else and_key)
 131                for arg in sysargs
 132            ]
 133
 134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
 135        if executor_keys is None:
 136            from meerschaum.jobs import get_executor_keys_from_context
 137            executor_keys = get_executor_keys_from_context()
 138
 139        self.executor_keys = executor_keys
 140        self.name = name
 141        self.refresh_seconds = (
 142            refresh_seconds
 143            if refresh_seconds is not None
 144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
 145        )
 146        try:
 147            self._daemon = (
 148                Daemon(daemon_id=name)
 149                if executor_keys == 'local'
 150                else None
 151            )
 152        except Exception:
 153            self._daemon = None
 154
 155        ### Handle any injected dependencies.
 156        if _rotating_log is not None:
 157            self._rotating_log = _rotating_log
 158            if self._daemon is not None:
 159                self._daemon._rotating_log = _rotating_log
 160
 161        if _stdin_file is not None:
 162            self._stdin_file = _stdin_file
 163            if self._daemon is not None:
 164                self._daemon._stdin_file = _stdin_file
 165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
 166
 167        if _status_hook is not None:
 168            self._status_hook = _status_hook
 169
 170        if _result_hook is not None:
 171            self._result_hook = _result_hook
 172
 173        self._externally_managed = _externally_managed
 174        self._properties_patch = _properties or {}
 175        if _externally_managed:
 176            self._properties_patch.update({'externally_managed': _externally_managed})
 177
 178        if env:
 179            self._properties_patch.update({'env': env})
 180
 181        if delete_after_completion:
 182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
 183
 184        daemon_sysargs = (
 185            self._daemon.properties.get('target', {}).get('args', [None])[0]
 186            if self._daemon is not None
 187            else None
 188        )
 189
 190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
 191            warn("Given sysargs differ from existing sysargs.")
 192
 193        self._sysargs = [
 194            arg
 195            for arg in (daemon_sysargs or sysargs or [])
 196            if arg not in ('-d', '--daemon')
 197        ]
 198        for restart_flag in RESTART_FLAGS:
 199            if restart_flag in self._sysargs:
 200                self._properties_patch.update({'restart': True})
 201                break
 202
 203    @staticmethod
 204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
 205        """
 206        Build a `Job` from the PID of a running Meerschaum process.
 207
 208        Parameters
 209        ----------
 210        pid: int
 211            The PID of the process.
 212
 213        executor_keys: Optional[str], default None
 214            The executor keys to assign to the job.
 215        """
 216        psutil = mrsm.attempt_import('psutil')
 217        try:
 218            process = psutil.Process(pid)
 219        except psutil.NoSuchProcess as e:
 220            warn(f"Process with PID {pid} does not exist.", stack=False)
 221            raise e
 222
 223        command_args = process.cmdline()
 224        is_daemon = command_args[1] == '-c'
 225
 226        if is_daemon:
 227            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
 228            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
 229            if root_dir is None:
 230                root_dir = paths.ROOT_DIR_PATH
 231            else:
 232                root_dir = pathlib.Path(root_dir)
 233            jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name
 234            daemon_dir = jobs_dir / daemon_id
 235            pid_file = daemon_dir / 'process.pid'
 236
 237            if pid_file.exists():
 238                with open(pid_file, 'r', encoding='utf-8') as f:
 239                    daemon_pid = int(f.read())
 240
 241                if pid != daemon_pid:
 242                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
 243            else:
 244                raise EnvironmentError(f"Is job '{daemon_id}' running?")
 245
 246            return Job(daemon_id, executor_keys=executor_keys)
 247
 248        from meerschaum._internal.arguments._parse_arguments import parse_arguments
 249        from meerschaum.utils.daemon import get_new_daemon_name
 250
 251        mrsm_ix = 0
 252        for i, arg in enumerate(command_args):
 253            if 'mrsm' in arg or 'meerschaum' in arg.lower():
 254                mrsm_ix = i
 255                break
 256
 257        sysargs = command_args[mrsm_ix+1:]
 258        kwargs = parse_arguments(sysargs)
 259        name = kwargs.get('name', get_new_daemon_name())
 260        return Job(name, sysargs, executor_keys=executor_keys)
 261
 262    def start(self, debug: bool = False) -> SuccessTuple:
 263        """
 264        Start the job's daemon.
 265        """
 266        if self.executor is not None:
 267            if not self.exists(debug=debug):
 268                return self.executor.create_job(
 269                    self.name,
 270                    self.sysargs,
 271                    properties=self.daemon.properties,
 272                    debug=debug,
 273                )
 274            return self.executor.start_job(self.name, debug=debug)
 275
 276        if self.is_running():
 277            return True, f"{self} is already running."
 278
 279        success, msg = self.daemon.run(
 280            keep_daemon_output=(not self.delete_after_completion),
 281            allow_dirty_run=True,
 282        )
 283        if not success:
 284            return success, msg
 285
 286        return success, f"Started {self}."
 287
 288    def stop(
 289        self,
 290        timeout_seconds: Union[int, float, None] = None,
 291        debug: bool = False,
 292    ) -> SuccessTuple:
 293        """
 294        Stop the job's daemon.
 295        """
 296        if self.executor is not None:
 297            return self.executor.stop_job(self.name, debug=debug)
 298
 299        if self.daemon.status == 'stopped':
 300            if not self.restart:
 301                return True, f"{self} is not running."
 302            elif self.stop_time is not None:
 303                return True, f"{self} will not restart until manually started."
 304
 305        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
 306        if quit_success:
 307            return quit_success, f"Stopped {self}."
 308
 309        warn(
 310            f"Failed to gracefully quit {self}.",
 311            stack=False,
 312        )
 313        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
 314        if not kill_success:
 315            return kill_success, kill_msg
 316
 317        return kill_success, f"Killed {self}."
 318
 319    def pause(
 320        self,
 321        timeout_seconds: Union[int, float, None] = None,
 322        debug: bool = False,
 323    ) -> SuccessTuple:
 324        """
 325        Pause the job's daemon.
 326        """
 327        if self.executor is not None:
 328            return self.executor.pause_job(self.name, debug=debug)
 329
 330        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
 331        if not pause_success:
 332            return pause_success, pause_msg
 333
 334        return pause_success, f"Paused {self}."
 335
 336    def delete(self, debug: bool = False) -> SuccessTuple:
 337        """
 338        Delete the job and its daemon.
 339        """
 340        if self.executor is not None:
 341            return self.executor.delete_job(self.name, debug=debug)
 342
 343        if self.is_running():
 344            stop_success, stop_msg = self.stop()
 345            if not stop_success:
 346                return stop_success, stop_msg
 347
 348        cleanup_success, cleanup_msg = self.daemon.cleanup()
 349        if not cleanup_success:
 350            return cleanup_success, cleanup_msg
 351
 352        _ = self.daemon._properties.pop('result', None)
 353        return cleanup_success, f"Deleted {self}."
 354
 355    def is_running(self) -> bool:
 356        """
 357        Determine whether the job's daemon is running.
 358        """
 359        return self.status == 'running'
 360
 361    def exists(self, debug: bool = False) -> bool:
 362        """
 363        Determine whether the job exists.
 364        """
 365        if self.executor is not None:
 366            return self.executor.get_job_exists(self.name, debug=debug)
 367
 368        return self.daemon.path.exists()
 369
 370    def get_logs(self) -> Union[str, None]:
 371        """
 372        Return the output text of the job's daemon.
 373        """
 374        if self.executor is not None:
 375            return self.executor.get_logs(self.name)
 376
 377        return self.daemon.log_text
 378
 379    def monitor_logs(
 380        self,
 381        callback_function: Callable[[str], None] = _default_stdout_callback,
 382        input_callback_function: Optional[Callable[[], str]] = None,
 383        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 384        stop_event: Optional[asyncio.Event] = None,
 385        stop_on_exit: bool = False,
 386        strip_timestamps: bool = False,
 387        accept_input: bool = True,
 388        debug: bool = False,
 389        _logs_path: Optional[pathlib.Path] = None,
 390        _log=None,
 391        _stdin_file=None,
 392        _wait_if_stopped: bool = True,
 393    ):
 394        """
 395        Monitor the job's log files and execute a callback on new lines.
 396
 397        Parameters
 398        ----------
 399        callback_function: Callable[[str], None], default partial(print, end='')
 400            The callback to execute as new data comes in.
 401            Defaults to printing the output directly to `stdout`.
 402
 403        input_callback_function: Optional[Callable[[], str]], default None
 404            If provided, execute this callback when the daemon is blocking on stdin.
 405            Defaults to `sys.stdin.readline()`.
 406
 407        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 408            If provided, execute this callback when the daemon stops.
 409            The job's SuccessTuple will be passed to the callback.
 410
 411        stop_event: Optional[asyncio.Event], default None
 412            If provided, stop monitoring when this event is set.
 413            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 414            from within `callback_function` to stop monitoring.
 415
 416        stop_on_exit: bool, default False
 417            If `True`, stop monitoring when the job stops.
 418
 419        strip_timestamps: bool, default False
 420            If `True`, remove leading timestamps from lines.
 421
 422        accept_input: bool, default True
 423            If `True`, accept input when the daemon blocks on stdin.
 424        """
 425        if self.executor is not None:
 426            self.executor.monitor_logs(
 427                self.name,
 428                callback_function,
 429                input_callback_function=input_callback_function,
 430                stop_callback_function=stop_callback_function,
 431                stop_on_exit=stop_on_exit,
 432                accept_input=accept_input,
 433                strip_timestamps=strip_timestamps,
 434                debug=debug,
 435            )
 436            return
 437
 438        monitor_logs_coroutine = self.monitor_logs_async(
 439            callback_function=callback_function,
 440            input_callback_function=input_callback_function,
 441            stop_callback_function=stop_callback_function,
 442            stop_event=stop_event,
 443            stop_on_exit=stop_on_exit,
 444            strip_timestamps=strip_timestamps,
 445            accept_input=accept_input,
 446            debug=debug,
 447            _logs_path=_logs_path,
 448            _log=_log,
 449            _stdin_file=_stdin_file,
 450            _wait_if_stopped=_wait_if_stopped,
 451        )
 452        return asyncio.run(monitor_logs_coroutine)
 453
 454    async def monitor_logs_async(
 455        self,
 456        callback_function: Callable[[str], None] = _default_stdout_callback,
 457        input_callback_function: Optional[Callable[[], str]] = None,
 458        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
 459        stop_event: Optional[asyncio.Event] = None,
 460        stop_on_exit: bool = False,
 461        strip_timestamps: bool = False,
 462        accept_input: bool = True,
 463        debug: bool = False,
 464        _logs_path: Optional[pathlib.Path] = None,
 465        _log=None,
 466        _stdin_file=None,
 467        _wait_if_stopped: bool = True,
 468    ):
 469        """
 470        Monitor the job's log files and await a callback on new lines.
 471
 472        Parameters
 473        ----------
 474        callback_function: Callable[[str], None], default _default_stdout_callback
 475            The callback to execute as new data comes in.
 476            Defaults to printing the output directly to `stdout`.
 477
 478        input_callback_function: Optional[Callable[[], str]], default None
 479            If provided, execute this callback when the daemon is blocking on stdin.
 480            Defaults to `sys.stdin.readline()`.
 481
 482        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
 483            If provided, execute this callback when the daemon stops.
 484            The job's SuccessTuple will be passed to the callback.
 485
 486        stop_event: Optional[asyncio.Event], default None
 487            If provided, stop monitoring when this event is set.
 488            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
 489            from within `callback_function` to stop monitoring.
 490
 491        stop_on_exit: bool, default False
 492            If `True`, stop monitoring when the job stops.
 493
 494        strip_timestamps: bool, default False
 495            If `True`, remove leading timestamps from lines.
 496
 497        accept_input: bool, default True
 498            If `True`, accept input when the daemon blocks on stdin.
 499        """
 500        from meerschaum.utils.prompt import prompt
 501
 502        def default_input_callback_function():
 503            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
 504            if prompt_kwargs:
 505                answer = prompt(**prompt_kwargs)
 506                return answer + '\n'
 507            return sys.stdin.readline()
 508
 509        if input_callback_function is None:
 510            input_callback_function = default_input_callback_function
 511
 512        if self.executor is not None:
 513            await self.executor.monitor_logs_async(
 514                self.name,
 515                callback_function,
 516                input_callback_function=input_callback_function,
 517                stop_callback_function=stop_callback_function,
 518                stop_on_exit=stop_on_exit,
 519                strip_timestamps=strip_timestamps,
 520                accept_input=accept_input,
 521                debug=debug,
 522            )
 523            return
 524
 525        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
 526
 527        events = {
 528            'user': stop_event,
 529            'stopped': asyncio.Event(),
 530            'stop_token': asyncio.Event(),
 531            'stop_exception': asyncio.Event(),
 532            'stopped_timeout': asyncio.Event(),
 533        }
 534        combined_event = asyncio.Event()
 535        emitted_text = False
 536        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
 537
 538        async def check_job_status():
 539            if not stop_on_exit:
 540                return
 541
 542            nonlocal emitted_text
 543
 544            sleep_time = 0.1
 545            while sleep_time < 0.2:
 546                if self.status == 'stopped':
 547                    if not emitted_text and _wait_if_stopped:
 548                        await asyncio.sleep(sleep_time)
 549                        sleep_time = round(sleep_time * 1.1, 3)
 550                        continue
 551
 552                    if stop_callback_function is not None:
 553                        try:
 554                            if asyncio.iscoroutinefunction(stop_callback_function):
 555                                await stop_callback_function(self.result)
 556                            else:
 557                                stop_callback_function(self.result)
 558                        except asyncio.exceptions.CancelledError:
 559                            break
 560                        except Exception:
 561                            warn(traceback.format_exc())
 562
 563                    if stop_on_exit:
 564                        events['stopped'].set()
 565
 566                    break
 567                await asyncio.sleep(0.1)
 568
 569            events['stopped_timeout'].set()
 570
 571        async def check_blocking_on_input():
 572            while True:
 573                if not emitted_text or not self.is_blocking_on_stdin():
 574                    try:
 575                        await asyncio.sleep(self.refresh_seconds)
 576                    except asyncio.exceptions.CancelledError:
 577                        break
 578                    continue
 579
 580                if not self.is_running():
 581                    break
 582
 583                await emit_latest_lines()
 584
 585                try:
 586                    print('', end='', flush=True)
 587                    if asyncio.iscoroutinefunction(input_callback_function):
 588                        data = await input_callback_function()
 589                    else:
 590                        loop = asyncio.get_running_loop()
 591                        data = await loop.run_in_executor(None, input_callback_function)
 592                except KeyboardInterrupt:
 593                    break
 594                #  if not data.endswith('\n'):
 595                    #  data += '\n'
 596
 597                stdin_file.write(data)
 598                await asyncio.sleep(self.refresh_seconds)
 599
 600        async def combine_events():
 601            event_tasks = [
 602                asyncio.create_task(event.wait())
 603                for event in events.values()
 604                if event is not None
 605            ]
 606            if not event_tasks:
 607                return
 608
 609            try:
 610                done, pending = await asyncio.wait(
 611                    event_tasks,
 612                    return_when=asyncio.FIRST_COMPLETED,
 613                )
 614                for task in pending:
 615                    task.cancel()
 616            except asyncio.exceptions.CancelledError:
 617                pass
 618            finally:
 619                combined_event.set()
 620
 621        check_job_status_task = asyncio.create_task(check_job_status())
 622        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
 623        combine_events_task = asyncio.create_task(combine_events())
 624
 625        log = _log if _log is not None else self.daemon.rotating_log
 626        lines_to_show = (
 627            self.daemon.properties.get(
 628                'logs', {}
 629            ).get(
 630                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
 631            )
 632        )
 633
 634        async def emit_latest_lines():
 635            nonlocal emitted_text
 636            nonlocal stop_event
 637            lines = log.readlines()
 638            for line in lines[(-1 * lines_to_show):]:
 639                if stop_event is not None and stop_event.is_set():
 640                    return
 641
 642                line_stripped_extra = strip_timestamp_from_line(line.strip())
 643                line_stripped = strip_timestamp_from_line(line)
 644
 645                if line_stripped_extra == STOP_TOKEN:
 646                    events['stop_token'].set()
 647                    return
 648
 649                if line_stripped_extra == CLEAR_TOKEN:
 650                    clear_screen(debug=debug)
 651                    continue
 652
 653                if line_stripped_extra == FLUSH_TOKEN.strip():
 654                    line_stripped = ''
 655                    line = ''
 656
 657                if strip_timestamps:
 658                    line = line_stripped
 659
 660                try:
 661                    if asyncio.iscoroutinefunction(callback_function):
 662                        await callback_function(line)
 663                    else:
 664                        callback_function(line)
 665                    emitted_text = True
 666                except StopMonitoringLogs:
 667                    events['stop_exception'].set()
 668                    return
 669                except Exception:
 670                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
 671
 672        await emit_latest_lines()
 673
 674        tasks = (
 675            [check_job_status_task]
 676            + ([check_blocking_on_input_task] if accept_input else [])
 677            + [combine_events_task]
 678        )
 679        try:
 680            _ = asyncio.gather(*tasks, return_exceptions=True)
 681        except asyncio.exceptions.CancelledError:
 682            raise
 683        except Exception:
 684            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
 685
 686        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
 687        dir_path_to_monitor = (
 688            _logs_path
 689            or (log.file_path.parent if log else None)
 690            or paths.LOGS_RESOURCES_PATH
 691        )
 692        async for changes in watchfiles.awatch(
 693            dir_path_to_monitor,
 694            stop_event=combined_event,
 695        ):
 696            for change in changes:
 697                file_path_str = change[1]
 698                file_path = pathlib.Path(file_path_str)
 699                latest_subfile_path = log.get_latest_subfile_path()
 700                if latest_subfile_path != file_path:
 701                    continue
 702
 703                await emit_latest_lines()
 704
 705        await emit_latest_lines()
 706
 707    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
 708        """
 709        Return whether a job's daemon is blocking on stdin.
 710        """
 711        if self.executor is not None:
 712            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
 713
 714        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()
 715
 716    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
 717        """
 718        Return the kwargs to the blocking `prompt()`, if available.
 719        """
 720        if self.executor is not None:
 721            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
 722
 723        if not self.daemon.prompt_kwargs_file_path.exists():
 724            return {}
 725
 726        try:
 727            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
 728                prompt_kwargs = json.load(f)
 729
 730            return prompt_kwargs
 731        
 732        except Exception:
 733            import traceback
 734            traceback.print_exc()
 735            return {}
 736
 737    def write_stdin(self, data):
 738        """
 739        Write to a job's daemon's `stdin`.
 740        """
 741        self.daemon.stdin_file.write(data)
 742
 743    @property
 744    def executor(self) -> Union[Executor, None]:
 745        """
 746        If the job is remote, return the connector to the remote API instance.
 747        """
 748        return (
 749            mrsm.get_connector(self.executor_keys)
 750            if self.executor_keys != 'local'
 751            else None
 752        )
 753
 754    @property
 755    def status(self) -> str:
 756        """
 757        Return the running status of the job's daemon.
 758        """
 759        if '_status_hook' in self.__dict__:
 760            return self._status_hook()
 761
 762        if self.executor is not None:
 763            return self.executor.get_job_status(self.name)
 764
 765        return self.daemon.status
 766
 767    @property
 768    def pid(self) -> Union[int, None]:
 769        """
 770        Return the PID of the job's dameon.
 771        """
 772        if self.executor is not None:
 773            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
 774
 775        return self.daemon.pid
 776
 777    @property
 778    def restart(self) -> bool:
 779        """
 780        Return whether to restart a stopped job.
 781        """
 782        if self.executor is not None:
 783            return self.executor.get_job_metadata(self.name).get('restart', False)
 784
 785        return self.daemon.properties.get('restart', False)
 786
 787    @property
 788    def result(self) -> SuccessTuple:
 789        """
 790        Return the `SuccessTuple` when the job has terminated.
 791        """
 792        if self.is_running():
 793            return True, f"{self} is running."
 794
 795        if '_result_hook' in self.__dict__:
 796            return self._result_hook()
 797
 798        if self.executor is not None:
 799            return (
 800                self.executor.get_job_metadata(self.name)
 801                .get('result', (False, "No result available."))
 802            )
 803
 804        _result = self.daemon.properties.get('result', None)
 805        if _result is None:
 806            from meerschaum.utils.daemon.Daemon import _results
 807            return _results.get(self.daemon.daemon_id, (False, "No result available."))
 808
 809        return tuple(_result)
 810
 811    @property
 812    def sysargs(self) -> List[str]:
 813        """
 814        Return the sysargs to use for the Daemon.
 815        """
 816        if self._sysargs:
 817            return self._sysargs
 818
 819        if self.executor is not None:
 820            return self.executor.get_job_metadata(self.name).get('sysargs', [])
 821
 822        target_args = self.daemon.target_args
 823        if target_args is None:
 824            return []
 825        self._sysargs = target_args[0] if len(target_args) > 0 else []
 826        return self._sysargs
 827
 828    def get_daemon_properties(self) -> Dict[str, Any]:
 829        """
 830        Return the `properties` dictionary for the job's daemon.
 831        """
 832        remote_properties = (
 833            {}
 834            if self.executor is None
 835            else self.executor.get_job_properties(self.name)
 836        )
 837        return {
 838            **remote_properties,
 839            **self._properties_patch
 840        }
 841
 842    @property
 843    def daemon(self) -> 'Daemon':
 844        """
 845        Return the daemon which this job manages.
 846        """
 847        from meerschaum.utils.daemon import Daemon
 848        if self._daemon is not None and self.executor is None and self._sysargs:
 849            return self._daemon
 850
 851        self._daemon = Daemon(
 852            target=entry,
 853            target_args=[self._sysargs],
 854            target_kw={},
 855            daemon_id=self.name,
 856            label=shlex.join(self._sysargs),
 857            properties=self.get_daemon_properties(),
 858        )
 859        if '_rotating_log' in self.__dict__:
 860            self._daemon._rotating_log = self._rotating_log
 861
 862        if '_stdin_file' in self.__dict__:
 863            self._daemon._stdin_file = self._stdin_file
 864            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
 865
 866        return self._daemon
 867
 868    @property
 869    def began(self) -> Union[datetime, None]:
 870        """
 871        The datetime when the job began running.
 872        """
 873        if self.executor is not None:
 874            began_str = self.executor.get_job_began(self.name)
 875            if began_str is None:
 876                return None
 877            return (
 878                datetime.fromisoformat(began_str)
 879                .astimezone(timezone.utc)
 880                .replace(tzinfo=None)
 881            )
 882
 883        began_str = self.daemon.properties.get('process', {}).get('began', None)
 884        if began_str is None:
 885            return None
 886
 887        return datetime.fromisoformat(began_str)
 888
 889    @property
 890    def ended(self) -> Union[datetime, None]:
 891        """
 892        The datetime when the job stopped running.
 893        """
 894        if self.executor is not None:
 895            ended_str = self.executor.get_job_ended(self.name)
 896            if ended_str is None:
 897                return None
 898            return (
 899                datetime.fromisoformat(ended_str)
 900                .astimezone(timezone.utc)
 901                .replace(tzinfo=None)
 902            )
 903
 904        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
 905        if ended_str is None:
 906            return None
 907
 908        return datetime.fromisoformat(ended_str)
 909
 910    @property
 911    def paused(self) -> Union[datetime, None]:
 912        """
 913        The datetime when the job was suspended while running.
 914        """
 915        if self.executor is not None:
 916            paused_str = self.executor.get_job_paused(self.name)
 917            if paused_str is None:
 918                return None
 919            return (
 920                datetime.fromisoformat(paused_str)
 921                .astimezone(timezone.utc)
 922                .replace(tzinfo=None)
 923            )
 924
 925        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
 926        if paused_str is None:
 927            return None
 928
 929        return datetime.fromisoformat(paused_str)
 930
 931    @property
 932    def stop_time(self) -> Union[datetime, None]:
 933        """
 934        Return the timestamp when the job was manually stopped.
 935        """
 936        if self.executor is not None:
 937            return self.executor.get_job_stop_time(self.name)
 938
 939        if not self.daemon.stop_path.exists():
 940            return None
 941
 942        stop_data = self.daemon._read_stop_file()
 943        if not stop_data:
 944            return None
 945
 946        stop_time_str = stop_data.get('stop_time', None)
 947        if not stop_time_str:
 948            warn(f"Could not read stop time for {self}.")
 949            return None
 950
 951        return datetime.fromisoformat(stop_time_str)
 952
 953    @property
 954    def hidden(self) -> bool:
 955        """
 956        Return a bool indicating whether this job should be displayed.
 957        """
 958        return (
 959            self.name.startswith('_')
 960            or self.name.startswith('.')
 961            or self._is_externally_managed
 962        )
 963
 964    def check_restart(self) -> SuccessTuple:
 965        """
 966        If `restart` is `True` and the daemon is not running,
 967        restart the job.
 968        Do not restart if the job was manually stopped.
 969        """
 970        if self.is_running():
 971            return True, f"{self} is running."
 972
 973        if not self.restart:
 974            return True, f"{self} does not need to be restarted."
 975
 976        if self.stop_time is not None:
 977            return True, f"{self} was manually stopped."
 978
 979        return self.start()
 980
 981    @property
 982    def label(self) -> str:
 983        """
 984        Return the job's Daemon label (joined sysargs).
 985        """
 986        from meerschaum._internal.arguments import compress_pipeline_sysargs
 987        sysargs = compress_pipeline_sysargs(self.sysargs)
 988        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()
 989
 990    @property
 991    def _externally_managed_file(self) -> pathlib.Path:
 992        """
 993        Return the path to the externally managed file.
 994        """
 995        return self.daemon.path / '.externally-managed'
 996
 997    def _set_externally_managed(self):
 998        """
 999        Set this job as externally managed.
1000        """
1001        self._externally_managed = True
1002        try:
1003            self._externally_managed_file.parent.mkdir(exist_ok=True, parents=True)
1004            self._externally_managed_file.touch()
1005        except Exception as e:
1006            warn(e)
1007
1008    @property
1009    def _is_externally_managed(self) -> bool:
1010        """
1011        Return whether this job is externally managed.
1012        """
1013        return self.executor_keys in (None, 'local') and (
1014            self._externally_managed or self._externally_managed_file.exists()
1015        )
1016
1017    @property
1018    def env(self) -> Dict[str, str]:
1019        """
1020        Return the environment variables to set for the job's process.
1021        """
1022        if '_env' in self.__dict__:
1023            return self.__dict__['_env']
1024
1025        _env = self.daemon.properties.get('env', {})
1026        default_env = {
1027            'PYTHONUNBUFFERED': '1',
1028            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1029            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1030            STATIC_CONFIG['environment']['noninteractive']: 'true',
1031        }
1032        self._env = {**default_env, **_env}
1033        return self._env
1034
1035    @property
1036    def delete_after_completion(self) -> bool:
1037        """
1038        Return whether this job is configured to delete itself after completion.
1039        """
1040        if '_delete_after_completion' in self.__dict__:
1041            return self.__dict__.get('_delete_after_completion', False)
1042
1043        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1044        return self._delete_after_completion
1045
1046    def __str__(self) -> str:
1047        sysargs = self.sysargs
1048        sysargs_str = shlex.join(sysargs) if sysargs else ''
1049        job_str = f'Job("{self.name}"'
1050        if sysargs_str:
1051            job_str += f', "{sysargs_str}"'
1052
1053        job_str += ')'
1054        return job_str
1055
1056    def __repr__(self) -> str:
1057        return str(self)
1058
1059    def __hash__(self) -> int:
1060        return hash(self.name)

Manage a meerschaum.utils.daemon.Daemon, locally or remotely via the API.

Job( name: str, sysargs: Union[List[str], str, NoneType] = None, env: Optional[Dict[str, str]] = None, executor_keys: Optional[str] = None, delete_after_completion: bool = False, refresh_seconds: Union[int, float, NoneType] = None, _properties: Optional[Dict[str, Any]] = None, _rotating_log=None, _stdin_file=None, _status_hook: Optional[Callable[[], str]] = None, _result_hook: Optional[Callable[[], Tuple[bool, str]]] = None, _externally_managed: bool = False)
 75    def __init__(
 76        self,
 77        name: str,
 78        sysargs: Union[List[str], str, None] = None,
 79        env: Optional[Dict[str, str]] = None,
 80        executor_keys: Optional[str] = None,
 81        delete_after_completion: bool = False,
 82        refresh_seconds: Union[int, float, None] = None,
 83        _properties: Optional[Dict[str, Any]] = None,
 84        _rotating_log=None,
 85        _stdin_file=None,
 86        _status_hook: Optional[Callable[[], str]] = None,
 87        _result_hook: Optional[Callable[[], SuccessTuple]] = None,
 88        _externally_managed: bool = False,
 89    ):
 90        """
 91        Create a new job to manage a `meerschaum.utils.daemon.Daemon`.
 92
 93        Parameters
 94        ----------
 95        name: str
 96            The name of the job to be created.
 97            This will also be used as the Daemon ID.
 98
 99        sysargs: Union[List[str], str, None], default None
100            The sysargs of the command to be executed, e.g. 'start api'.
101
102        env: Optional[Dict[str, str]], default None
103            If provided, set these environment variables in the job's process.
104
105        executor_keys: Optional[str], default None
106            If provided, execute the job remotely on an API instance, e.g. 'api:main'.
107
108        delete_after_completion: bool, default False
109            If `True`, delete this job when it has finished executing.
110
111        refresh_seconds: Union[int, float, None], default None
112            The number of seconds to sleep between refreshes.
113            Defaults to the configured value `system.cli.refresh_seconds`.
114
115        _properties: Optional[Dict[str, Any]], default None
116            If provided, use this to patch the daemon's properties.
117        """
118        from meerschaum.utils.daemon import Daemon
119        for char in BANNED_CHARS:
120            if char in name:
121                raise ValueError(f"Invalid name: ({char}) is not allowed.")
122
123        if isinstance(sysargs, str):
124            sysargs = shlex.split(sysargs)
125
126        and_key = STATIC_CONFIG['system']['arguments']['and_key']
127        escaped_and_key = STATIC_CONFIG['system']['arguments']['escaped_and_key']
128        if sysargs:
129            sysargs = [
130                (arg if arg != escaped_and_key else and_key)
131                for arg in sysargs
132            ]
133
134        ### NOTE: 'local' and 'systemd' executors are being coalesced.
135        if executor_keys is None:
136            from meerschaum.jobs import get_executor_keys_from_context
137            executor_keys = get_executor_keys_from_context()
138
139        self.executor_keys = executor_keys
140        self.name = name
141        self.refresh_seconds = (
142            refresh_seconds
143            if refresh_seconds is not None
144            else mrsm.get_config('system', 'cli', 'refresh_seconds')
145        )
146        try:
147            self._daemon = (
148                Daemon(daemon_id=name)
149                if executor_keys == 'local'
150                else None
151            )
152        except Exception:
153            self._daemon = None
154
155        ### Handle any injected dependencies.
156        if _rotating_log is not None:
157            self._rotating_log = _rotating_log
158            if self._daemon is not None:
159                self._daemon._rotating_log = _rotating_log
160
161        if _stdin_file is not None:
162            self._stdin_file = _stdin_file
163            if self._daemon is not None:
164                self._daemon._stdin_file = _stdin_file
165                self._daemon._blocking_stdin_file_path = _stdin_file.blocking_file_path
166
167        if _status_hook is not None:
168            self._status_hook = _status_hook
169
170        if _result_hook is not None:
171            self._result_hook = _result_hook
172
173        self._externally_managed = _externally_managed
174        self._properties_patch = _properties or {}
175        if _externally_managed:
176            self._properties_patch.update({'externally_managed': _externally_managed})
177
178        if env:
179            self._properties_patch.update({'env': env})
180
181        if delete_after_completion:
182            self._properties_patch.update({'delete_after_completion': delete_after_completion})
183
184        daemon_sysargs = (
185            self._daemon.properties.get('target', {}).get('args', [None])[0]
186            if self._daemon is not None
187            else None
188        )
189
190        if daemon_sysargs and sysargs and daemon_sysargs != sysargs:
191            warn("Given sysargs differ from existing sysargs.")
192
193        self._sysargs = [
194            arg
195            for arg in (daemon_sysargs or sysargs or [])
196            if arg not in ('-d', '--daemon')
197        ]
198        for restart_flag in RESTART_FLAGS:
199            if restart_flag in self._sysargs:
200                self._properties_patch.update({'restart': True})
201                break

Create a new job to manage a meerschaum.utils.daemon.Daemon.

Parameters
  • name (str): The name of the job to be created. This will also be used as the Daemon ID.
  • sysargs (Union[List[str], str, None], default None): The sysargs of the command to be executed, e.g. 'start api'.
  • env (Optional[Dict[str, str]], default None): If provided, set these environment variables in the job's process.
  • executor_keys (Optional[str], default None): If provided, execute the job remotely on an API instance, e.g. 'api:main'.
  • delete_after_completion (bool, default False): If True, delete this job when it has finished executing.
  • refresh_seconds (Union[int, float, None], default None): The number of seconds to sleep between refreshes. Defaults to the configured value system.cli.refresh_seconds.
  • _properties (Optional[Dict[str, Any]], default None): If provided, use this to patch the daemon's properties.
executor_keys
name
refresh_seconds
@staticmethod
def from_pid( pid: int, executor_keys: Optional[str] = None) -> Job:
203    @staticmethod
204    def from_pid(pid: int, executor_keys: Optional[str] = None) -> Job:
205        """
206        Build a `Job` from the PID of a running Meerschaum process.
207
208        Parameters
209        ----------
210        pid: int
211            The PID of the process.
212
213        executor_keys: Optional[str], default None
214            The executor keys to assign to the job.
215        """
216        psutil = mrsm.attempt_import('psutil')
217        try:
218            process = psutil.Process(pid)
219        except psutil.NoSuchProcess as e:
220            warn(f"Process with PID {pid} does not exist.", stack=False)
221            raise e
222
223        command_args = process.cmdline()
224        is_daemon = command_args[1] == '-c'
225
226        if is_daemon:
227            daemon_id = command_args[-1].split('daemon_id=')[-1].split(')')[0].replace("'", '')
228            root_dir = process.environ().get(STATIC_CONFIG['environment']['root'], None)
229            if root_dir is None:
230                root_dir = paths.ROOT_DIR_PATH
231            else:
232                root_dir = pathlib.Path(root_dir)
233            jobs_dir = root_dir / paths.DAEMON_RESOURCES_PATH.name
234            daemon_dir = jobs_dir / daemon_id
235            pid_file = daemon_dir / 'process.pid'
236
237            if pid_file.exists():
238                with open(pid_file, 'r', encoding='utf-8') as f:
239                    daemon_pid = int(f.read())
240
241                if pid != daemon_pid:
242                    raise EnvironmentError(f"Differing PIDs: {pid=}, {daemon_pid=}")
243            else:
244                raise EnvironmentError(f"Is job '{daemon_id}' running?")
245
246            return Job(daemon_id, executor_keys=executor_keys)
247
248        from meerschaum._internal.arguments._parse_arguments import parse_arguments
249        from meerschaum.utils.daemon import get_new_daemon_name
250
251        mrsm_ix = 0
252        for i, arg in enumerate(command_args):
253            if 'mrsm' in arg or 'meerschaum' in arg.lower():
254                mrsm_ix = i
255                break
256
257        sysargs = command_args[mrsm_ix+1:]
258        kwargs = parse_arguments(sysargs)
259        name = kwargs.get('name', get_new_daemon_name())
260        return Job(name, sysargs, executor_keys=executor_keys)

Build a Job from the PID of a running Meerschaum process.

Parameters
  • pid (int): The PID of the process.
  • executor_keys (Optional[str], default None): The executor keys to assign to the job.
def start(self, debug: bool = False) -> Tuple[bool, str]:
262    def start(self, debug: bool = False) -> SuccessTuple:
263        """
264        Start the job's daemon.
265        """
266        if self.executor is not None:
267            if not self.exists(debug=debug):
268                return self.executor.create_job(
269                    self.name,
270                    self.sysargs,
271                    properties=self.daemon.properties,
272                    debug=debug,
273                )
274            return self.executor.start_job(self.name, debug=debug)
275
276        if self.is_running():
277            return True, f"{self} is already running."
278
279        success, msg = self.daemon.run(
280            keep_daemon_output=(not self.delete_after_completion),
281            allow_dirty_run=True,
282        )
283        if not success:
284            return success, msg
285
286        return success, f"Started {self}."

Start the job's daemon.

def stop( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
288    def stop(
289        self,
290        timeout_seconds: Union[int, float, None] = None,
291        debug: bool = False,
292    ) -> SuccessTuple:
293        """
294        Stop the job's daemon.
295        """
296        if self.executor is not None:
297            return self.executor.stop_job(self.name, debug=debug)
298
299        if self.daemon.status == 'stopped':
300            if not self.restart:
301                return True, f"{self} is not running."
302            elif self.stop_time is not None:
303                return True, f"{self} will not restart until manually started."
304
305        quit_success, quit_msg = self.daemon.quit(timeout=timeout_seconds)
306        if quit_success:
307            return quit_success, f"Stopped {self}."
308
309        warn(
310            f"Failed to gracefully quit {self}.",
311            stack=False,
312        )
313        kill_success, kill_msg = self.daemon.kill(timeout=timeout_seconds)
314        if not kill_success:
315            return kill_success, kill_msg
316
317        return kill_success, f"Killed {self}."

Stop the job's daemon.

def pause( self, timeout_seconds: Union[int, float, NoneType] = None, debug: bool = False) -> Tuple[bool, str]:
319    def pause(
320        self,
321        timeout_seconds: Union[int, float, None] = None,
322        debug: bool = False,
323    ) -> SuccessTuple:
324        """
325        Pause the job's daemon.
326        """
327        if self.executor is not None:
328            return self.executor.pause_job(self.name, debug=debug)
329
330        pause_success, pause_msg = self.daemon.pause(timeout=timeout_seconds)
331        if not pause_success:
332            return pause_success, pause_msg
333
334        return pause_success, f"Paused {self}."

Pause the job's daemon.

def delete(self, debug: bool = False) -> Tuple[bool, str]:
336    def delete(self, debug: bool = False) -> SuccessTuple:
337        """
338        Delete the job and its daemon.
339        """
340        if self.executor is not None:
341            return self.executor.delete_job(self.name, debug=debug)
342
343        if self.is_running():
344            stop_success, stop_msg = self.stop()
345            if not stop_success:
346                return stop_success, stop_msg
347
348        cleanup_success, cleanup_msg = self.daemon.cleanup()
349        if not cleanup_success:
350            return cleanup_success, cleanup_msg
351
352        _ = self.daemon._properties.pop('result', None)
353        return cleanup_success, f"Deleted {self}."

Delete the job and its daemon.

def is_running(self) -> bool:
355    def is_running(self) -> bool:
356        """
357        Determine whether the job's daemon is running.
358        """
359        return self.status == 'running'

Determine whether the job's daemon is running.

def exists(self, debug: bool = False) -> bool:
361    def exists(self, debug: bool = False) -> bool:
362        """
363        Determine whether the job exists.
364        """
365        if self.executor is not None:
366            return self.executor.get_job_exists(self.name, debug=debug)
367
368        return self.daemon.path.exists()

Determine whether the job exists.

def get_logs(self) -> Optional[str]:
370    def get_logs(self) -> Union[str, None]:
371        """
372        Return the output text of the job's daemon.
373        """
374        if self.executor is not None:
375            return self.executor.get_logs(self.name)
376
377        return self.daemon.log_text

Return the output text of the job's daemon.

def monitor_logs( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
379    def monitor_logs(
380        self,
381        callback_function: Callable[[str], None] = _default_stdout_callback,
382        input_callback_function: Optional[Callable[[], str]] = None,
383        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
384        stop_event: Optional[asyncio.Event] = None,
385        stop_on_exit: bool = False,
386        strip_timestamps: bool = False,
387        accept_input: bool = True,
388        debug: bool = False,
389        _logs_path: Optional[pathlib.Path] = None,
390        _log=None,
391        _stdin_file=None,
392        _wait_if_stopped: bool = True,
393    ):
394        """
395        Monitor the job's log files and execute a callback on new lines.
396
397        Parameters
398        ----------
399        callback_function: Callable[[str], None], default partial(print, end='')
400            The callback to execute as new data comes in.
401            Defaults to printing the output directly to `stdout`.
402
403        input_callback_function: Optional[Callable[[], str]], default None
404            If provided, execute this callback when the daemon is blocking on stdin.
405            Defaults to `sys.stdin.readline()`.
406
407        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
408            If provided, execute this callback when the daemon stops.
409            The job's SuccessTuple will be passed to the callback.
410
411        stop_event: Optional[asyncio.Event], default None
412            If provided, stop monitoring when this event is set.
413            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
414            from within `callback_function` to stop monitoring.
415
416        stop_on_exit: bool, default False
417            If `True`, stop monitoring when the job stops.
418
419        strip_timestamps: bool, default False
420            If `True`, remove leading timestamps from lines.
421
422        accept_input: bool, default True
423            If `True`, accept input when the daemon blocks on stdin.
424        """
425        if self.executor is not None:
426            self.executor.monitor_logs(
427                self.name,
428                callback_function,
429                input_callback_function=input_callback_function,
430                stop_callback_function=stop_callback_function,
431                stop_on_exit=stop_on_exit,
432                accept_input=accept_input,
433                strip_timestamps=strip_timestamps,
434                debug=debug,
435            )
436            return
437
438        monitor_logs_coroutine = self.monitor_logs_async(
439            callback_function=callback_function,
440            input_callback_function=input_callback_function,
441            stop_callback_function=stop_callback_function,
442            stop_event=stop_event,
443            stop_on_exit=stop_on_exit,
444            strip_timestamps=strip_timestamps,
445            accept_input=accept_input,
446            debug=debug,
447            _logs_path=_logs_path,
448            _log=_log,
449            _stdin_file=_stdin_file,
450            _wait_if_stopped=_wait_if_stopped,
451        )
452        return asyncio.run(monitor_logs_coroutine)

Monitor the job's log files and execute a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default partial(print, end='')): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
async def monitor_logs_async( self, callback_function: Callable[[str], NoneType] = <function _default_stdout_callback>, input_callback_function: Optional[Callable[[], str]] = None, stop_callback_function: Optional[Callable[[Tuple[bool, str]], NoneType]] = None, stop_event: Optional[asyncio.locks.Event] = None, stop_on_exit: bool = False, strip_timestamps: bool = False, accept_input: bool = True, debug: bool = False, _logs_path: Optional[pathlib.Path] = None, _log=None, _stdin_file=None, _wait_if_stopped: bool = True):
454    async def monitor_logs_async(
455        self,
456        callback_function: Callable[[str], None] = _default_stdout_callback,
457        input_callback_function: Optional[Callable[[], str]] = None,
458        stop_callback_function: Optional[Callable[[SuccessTuple], None]] = None,
459        stop_event: Optional[asyncio.Event] = None,
460        stop_on_exit: bool = False,
461        strip_timestamps: bool = False,
462        accept_input: bool = True,
463        debug: bool = False,
464        _logs_path: Optional[pathlib.Path] = None,
465        _log=None,
466        _stdin_file=None,
467        _wait_if_stopped: bool = True,
468    ):
469        """
470        Monitor the job's log files and await a callback on new lines.
471
472        Parameters
473        ----------
474        callback_function: Callable[[str], None], default _default_stdout_callback
475            The callback to execute as new data comes in.
476            Defaults to printing the output directly to `stdout`.
477
478        input_callback_function: Optional[Callable[[], str]], default None
479            If provided, execute this callback when the daemon is blocking on stdin.
480            Defaults to `sys.stdin.readline()`.
481
482        stop_callback_function: Optional[Callable[[SuccessTuple]], str], default None
483            If provided, execute this callback when the daemon stops.
484            The job's SuccessTuple will be passed to the callback.
485
486        stop_event: Optional[asyncio.Event], default None
487            If provided, stop monitoring when this event is set.
488            You may instead raise `meerschaum.jobs.StopMonitoringLogs`
489            from within `callback_function` to stop monitoring.
490
491        stop_on_exit: bool, default False
492            If `True`, stop monitoring when the job stops.
493
494        strip_timestamps: bool, default False
495            If `True`, remove leading timestamps from lines.
496
497        accept_input: bool, default True
498            If `True`, accept input when the daemon blocks on stdin.
499        """
500        from meerschaum.utils.prompt import prompt
501
502        def default_input_callback_function():
503            prompt_kwargs = self.get_prompt_kwargs(debug=debug)
504            if prompt_kwargs:
505                answer = prompt(**prompt_kwargs)
506                return answer + '\n'
507            return sys.stdin.readline()
508
509        if input_callback_function is None:
510            input_callback_function = default_input_callback_function
511
512        if self.executor is not None:
513            await self.executor.monitor_logs_async(
514                self.name,
515                callback_function,
516                input_callback_function=input_callback_function,
517                stop_callback_function=stop_callback_function,
518                stop_on_exit=stop_on_exit,
519                strip_timestamps=strip_timestamps,
520                accept_input=accept_input,
521                debug=debug,
522            )
523            return
524
525        from meerschaum.utils.formatting._jobs import strip_timestamp_from_line
526
527        events = {
528            'user': stop_event,
529            'stopped': asyncio.Event(),
530            'stop_token': asyncio.Event(),
531            'stop_exception': asyncio.Event(),
532            'stopped_timeout': asyncio.Event(),
533        }
534        combined_event = asyncio.Event()
535        emitted_text = False
536        stdin_file = _stdin_file if _stdin_file is not None else self.daemon.stdin_file
537
538        async def check_job_status():
539            if not stop_on_exit:
540                return
541
542            nonlocal emitted_text
543
544            sleep_time = 0.1
545            while sleep_time < 0.2:
546                if self.status == 'stopped':
547                    if not emitted_text and _wait_if_stopped:
548                        await asyncio.sleep(sleep_time)
549                        sleep_time = round(sleep_time * 1.1, 3)
550                        continue
551
552                    if stop_callback_function is not None:
553                        try:
554                            if asyncio.iscoroutinefunction(stop_callback_function):
555                                await stop_callback_function(self.result)
556                            else:
557                                stop_callback_function(self.result)
558                        except asyncio.exceptions.CancelledError:
559                            break
560                        except Exception:
561                            warn(traceback.format_exc())
562
563                    if stop_on_exit:
564                        events['stopped'].set()
565
566                    break
567                await asyncio.sleep(0.1)
568
569            events['stopped_timeout'].set()
570
571        async def check_blocking_on_input():
572            while True:
573                if not emitted_text or not self.is_blocking_on_stdin():
574                    try:
575                        await asyncio.sleep(self.refresh_seconds)
576                    except asyncio.exceptions.CancelledError:
577                        break
578                    continue
579
580                if not self.is_running():
581                    break
582
583                await emit_latest_lines()
584
585                try:
586                    print('', end='', flush=True)
587                    if asyncio.iscoroutinefunction(input_callback_function):
588                        data = await input_callback_function()
589                    else:
590                        loop = asyncio.get_running_loop()
591                        data = await loop.run_in_executor(None, input_callback_function)
592                except KeyboardInterrupt:
593                    break
594                #  if not data.endswith('\n'):
595                    #  data += '\n'
596
597                stdin_file.write(data)
598                await asyncio.sleep(self.refresh_seconds)
599
600        async def combine_events():
601            event_tasks = [
602                asyncio.create_task(event.wait())
603                for event in events.values()
604                if event is not None
605            ]
606            if not event_tasks:
607                return
608
609            try:
610                done, pending = await asyncio.wait(
611                    event_tasks,
612                    return_when=asyncio.FIRST_COMPLETED,
613                )
614                for task in pending:
615                    task.cancel()
616            except asyncio.exceptions.CancelledError:
617                pass
618            finally:
619                combined_event.set()
620
621        check_job_status_task = asyncio.create_task(check_job_status())
622        check_blocking_on_input_task = asyncio.create_task(check_blocking_on_input())
623        combine_events_task = asyncio.create_task(combine_events())
624
625        log = _log if _log is not None else self.daemon.rotating_log
626        lines_to_show = (
627            self.daemon.properties.get(
628                'logs', {}
629            ).get(
630                'lines_to_show', get_config('jobs', 'logs', 'lines_to_show')
631            )
632        )
633
634        async def emit_latest_lines():
635            nonlocal emitted_text
636            nonlocal stop_event
637            lines = log.readlines()
638            for line in lines[(-1 * lines_to_show):]:
639                if stop_event is not None and stop_event.is_set():
640                    return
641
642                line_stripped_extra = strip_timestamp_from_line(line.strip())
643                line_stripped = strip_timestamp_from_line(line)
644
645                if line_stripped_extra == STOP_TOKEN:
646                    events['stop_token'].set()
647                    return
648
649                if line_stripped_extra == CLEAR_TOKEN:
650                    clear_screen(debug=debug)
651                    continue
652
653                if line_stripped_extra == FLUSH_TOKEN.strip():
654                    line_stripped = ''
655                    line = ''
656
657                if strip_timestamps:
658                    line = line_stripped
659
660                try:
661                    if asyncio.iscoroutinefunction(callback_function):
662                        await callback_function(line)
663                    else:
664                        callback_function(line)
665                    emitted_text = True
666                except StopMonitoringLogs:
667                    events['stop_exception'].set()
668                    return
669                except Exception:
670                    warn(f"Error in logs callback:\n{traceback.format_exc()}")
671
672        await emit_latest_lines()
673
674        tasks = (
675            [check_job_status_task]
676            + ([check_blocking_on_input_task] if accept_input else [])
677            + [combine_events_task]
678        )
679        try:
680            _ = asyncio.gather(*tasks, return_exceptions=True)
681        except asyncio.exceptions.CancelledError:
682            raise
683        except Exception:
684            warn(f"Failed to run async checks:\n{traceback.format_exc()}")
685
686        watchfiles = mrsm.attempt_import('watchfiles', lazy=False)
687        dir_path_to_monitor = (
688            _logs_path
689            or (log.file_path.parent if log else None)
690            or paths.LOGS_RESOURCES_PATH
691        )
692        async for changes in watchfiles.awatch(
693            dir_path_to_monitor,
694            stop_event=combined_event,
695        ):
696            for change in changes:
697                file_path_str = change[1]
698                file_path = pathlib.Path(file_path_str)
699                latest_subfile_path = log.get_latest_subfile_path()
700                if latest_subfile_path != file_path:
701                    continue
702
703                await emit_latest_lines()
704
705        await emit_latest_lines()

Monitor the job's log files and await a callback on new lines.

Parameters
  • callback_function (Callable[[str], None], default _default_stdout_callback): The callback to execute as new data comes in. Defaults to printing the output directly to stdout.
  • input_callback_function (Optional[Callable[[], str]], default None): If provided, execute this callback when the daemon is blocking on stdin. Defaults to sys.stdin.readline().
  • stop_callback_function (Optional[Callable[[SuccessTuple]], str], default None): If provided, execute this callback when the daemon stops. The job's SuccessTuple will be passed to the callback.
  • stop_event (Optional[asyncio.Event], default None): If provided, stop monitoring when this event is set. You may instead raise meerschaum.jobs.StopMonitoringLogs from within callback_function to stop monitoring.
  • stop_on_exit (bool, default False): If True, stop monitoring when the job stops.
  • strip_timestamps (bool, default False): If True, remove leading timestamps from lines.
  • accept_input (bool, default True): If True, accept input when the daemon blocks on stdin.
def is_blocking_on_stdin(self, debug: bool = False) -> bool:
707    def is_blocking_on_stdin(self, debug: bool = False) -> bool:
708        """
709        Return whether a job's daemon is blocking on stdin.
710        """
711        if self.executor is not None:
712            return self.executor.get_job_is_blocking_on_stdin(self.name, debug=debug)
713
714        return self.is_running() and self.daemon.blocking_stdin_file_path.exists()

Return whether a job's daemon is blocking on stdin.

def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
716    def get_prompt_kwargs(self, debug: bool = False) -> Dict[str, Any]:
717        """
718        Return the kwargs to the blocking `prompt()`, if available.
719        """
720        if self.executor is not None:
721            return self.executor.get_job_prompt_kwargs(self.name, debug=debug)
722
723        if not self.daemon.prompt_kwargs_file_path.exists():
724            return {}
725
726        try:
727            with open(self.daemon.prompt_kwargs_file_path, 'r', encoding='utf-8') as f:
728                prompt_kwargs = json.load(f)
729
730            return prompt_kwargs
731        
732        except Exception:
733            import traceback
734            traceback.print_exc()
735            return {}

Return the kwargs to the blocking prompt(), if available.

def write_stdin(self, data):
737    def write_stdin(self, data):
738        """
739        Write to a job's daemon's `stdin`.
740        """
741        self.daemon.stdin_file.write(data)

Write to a job's daemon's stdin.

executor: Optional[meerschaum.jobs.Executor]
743    @property
744    def executor(self) -> Union[Executor, None]:
745        """
746        If the job is remote, return the connector to the remote API instance.
747        """
748        return (
749            mrsm.get_connector(self.executor_keys)
750            if self.executor_keys != 'local'
751            else None
752        )

If the job is remote, return the connector to the remote API instance.

status: str
754    @property
755    def status(self) -> str:
756        """
757        Return the running status of the job's daemon.
758        """
759        if '_status_hook' in self.__dict__:
760            return self._status_hook()
761
762        if self.executor is not None:
763            return self.executor.get_job_status(self.name)
764
765        return self.daemon.status

Return the running status of the job's daemon.

pid: Optional[int]
767    @property
768    def pid(self) -> Union[int, None]:
769        """
770        Return the PID of the job's dameon.
771        """
772        if self.executor is not None:
773            return self.executor.get_job_metadata(self.name).get('daemon', {}).get('pid', None)
774
775        return self.daemon.pid

Return the PID of the job's dameon.

restart: bool
777    @property
778    def restart(self) -> bool:
779        """
780        Return whether to restart a stopped job.
781        """
782        if self.executor is not None:
783            return self.executor.get_job_metadata(self.name).get('restart', False)
784
785        return self.daemon.properties.get('restart', False)

Return whether to restart a stopped job.

result: Tuple[bool, str]
787    @property
788    def result(self) -> SuccessTuple:
789        """
790        Return the `SuccessTuple` when the job has terminated.
791        """
792        if self.is_running():
793            return True, f"{self} is running."
794
795        if '_result_hook' in self.__dict__:
796            return self._result_hook()
797
798        if self.executor is not None:
799            return (
800                self.executor.get_job_metadata(self.name)
801                .get('result', (False, "No result available."))
802            )
803
804        _result = self.daemon.properties.get('result', None)
805        if _result is None:
806            from meerschaum.utils.daemon.Daemon import _results
807            return _results.get(self.daemon.daemon_id, (False, "No result available."))
808
809        return tuple(_result)

Return the SuccessTuple when the job has terminated.

sysargs: List[str]
811    @property
812    def sysargs(self) -> List[str]:
813        """
814        Return the sysargs to use for the Daemon.
815        """
816        if self._sysargs:
817            return self._sysargs
818
819        if self.executor is not None:
820            return self.executor.get_job_metadata(self.name).get('sysargs', [])
821
822        target_args = self.daemon.target_args
823        if target_args is None:
824            return []
825        self._sysargs = target_args[0] if len(target_args) > 0 else []
826        return self._sysargs

Return the sysargs to use for the Daemon.

def get_daemon_properties(self) -> Dict[str, Any]:
828    def get_daemon_properties(self) -> Dict[str, Any]:
829        """
830        Return the `properties` dictionary for the job's daemon.
831        """
832        remote_properties = (
833            {}
834            if self.executor is None
835            else self.executor.get_job_properties(self.name)
836        )
837        return {
838            **remote_properties,
839            **self._properties_patch
840        }

Return the properties dictionary for the job's daemon.

daemon: "'Daemon'"
842    @property
843    def daemon(self) -> 'Daemon':
844        """
845        Return the daemon which this job manages.
846        """
847        from meerschaum.utils.daemon import Daemon
848        if self._daemon is not None and self.executor is None and self._sysargs:
849            return self._daemon
850
851        self._daemon = Daemon(
852            target=entry,
853            target_args=[self._sysargs],
854            target_kw={},
855            daemon_id=self.name,
856            label=shlex.join(self._sysargs),
857            properties=self.get_daemon_properties(),
858        )
859        if '_rotating_log' in self.__dict__:
860            self._daemon._rotating_log = self._rotating_log
861
862        if '_stdin_file' in self.__dict__:
863            self._daemon._stdin_file = self._stdin_file
864            self._daemon._blocking_stdin_file_path = self._stdin_file.blocking_file_path
865
866        return self._daemon

Return the daemon which this job manages.

began: Optional[datetime.datetime]
868    @property
869    def began(self) -> Union[datetime, None]:
870        """
871        The datetime when the job began running.
872        """
873        if self.executor is not None:
874            began_str = self.executor.get_job_began(self.name)
875            if began_str is None:
876                return None
877            return (
878                datetime.fromisoformat(began_str)
879                .astimezone(timezone.utc)
880                .replace(tzinfo=None)
881            )
882
883        began_str = self.daemon.properties.get('process', {}).get('began', None)
884        if began_str is None:
885            return None
886
887        return datetime.fromisoformat(began_str)

The datetime when the job began running.

ended: Optional[datetime.datetime]
889    @property
890    def ended(self) -> Union[datetime, None]:
891        """
892        The datetime when the job stopped running.
893        """
894        if self.executor is not None:
895            ended_str = self.executor.get_job_ended(self.name)
896            if ended_str is None:
897                return None
898            return (
899                datetime.fromisoformat(ended_str)
900                .astimezone(timezone.utc)
901                .replace(tzinfo=None)
902            )
903
904        ended_str = self.daemon.properties.get('process', {}).get('ended', None)
905        if ended_str is None:
906            return None
907
908        return datetime.fromisoformat(ended_str)

The datetime when the job stopped running.

paused: Optional[datetime.datetime]
910    @property
911    def paused(self) -> Union[datetime, None]:
912        """
913        The datetime when the job was suspended while running.
914        """
915        if self.executor is not None:
916            paused_str = self.executor.get_job_paused(self.name)
917            if paused_str is None:
918                return None
919            return (
920                datetime.fromisoformat(paused_str)
921                .astimezone(timezone.utc)
922                .replace(tzinfo=None)
923            )
924
925        paused_str = self.daemon.properties.get('process', {}).get('paused', None)
926        if paused_str is None:
927            return None
928
929        return datetime.fromisoformat(paused_str)

The datetime when the job was suspended while running.

stop_time: Optional[datetime.datetime]
931    @property
932    def stop_time(self) -> Union[datetime, None]:
933        """
934        Return the timestamp when the job was manually stopped.
935        """
936        if self.executor is not None:
937            return self.executor.get_job_stop_time(self.name)
938
939        if not self.daemon.stop_path.exists():
940            return None
941
942        stop_data = self.daemon._read_stop_file()
943        if not stop_data:
944            return None
945
946        stop_time_str = stop_data.get('stop_time', None)
947        if not stop_time_str:
948            warn(f"Could not read stop time for {self}.")
949            return None
950
951        return datetime.fromisoformat(stop_time_str)

Return the timestamp when the job was manually stopped.

hidden: bool
953    @property
954    def hidden(self) -> bool:
955        """
956        Return a bool indicating whether this job should be displayed.
957        """
958        return (
959            self.name.startswith('_')
960            or self.name.startswith('.')
961            or self._is_externally_managed
962        )

Return a bool indicating whether this job should be displayed.

def check_restart(self) -> Tuple[bool, str]:
964    def check_restart(self) -> SuccessTuple:
965        """
966        If `restart` is `True` and the daemon is not running,
967        restart the job.
968        Do not restart if the job was manually stopped.
969        """
970        if self.is_running():
971            return True, f"{self} is running."
972
973        if not self.restart:
974            return True, f"{self} does not need to be restarted."
975
976        if self.stop_time is not None:
977            return True, f"{self} was manually stopped."
978
979        return self.start()

If restart is True and the daemon is not running, restart the job. Do not restart if the job was manually stopped.

label: str
981    @property
982    def label(self) -> str:
983        """
984        Return the job's Daemon label (joined sysargs).
985        """
986        from meerschaum._internal.arguments import compress_pipeline_sysargs
987        sysargs = compress_pipeline_sysargs(self.sysargs)
988        return shlex.join(sysargs).replace(' + ', '\n+ ').replace(' : ', '\n: ').lstrip().rstrip()

Return the job's Daemon label (joined sysargs).

env: Dict[str, str]
1017    @property
1018    def env(self) -> Dict[str, str]:
1019        """
1020        Return the environment variables to set for the job's process.
1021        """
1022        if '_env' in self.__dict__:
1023            return self.__dict__['_env']
1024
1025        _env = self.daemon.properties.get('env', {})
1026        default_env = {
1027            'PYTHONUNBUFFERED': '1',
1028            'LINES': str(get_config('jobs', 'terminal', 'lines')),
1029            'COLUMNS': str(get_config('jobs', 'terminal', 'columns')),
1030            STATIC_CONFIG['environment']['noninteractive']: 'true',
1031        }
1032        self._env = {**default_env, **_env}
1033        return self._env

Return the environment variables to set for the job's process.

delete_after_completion: bool
1035    @property
1036    def delete_after_completion(self) -> bool:
1037        """
1038        Return whether this job is configured to delete itself after completion.
1039        """
1040        if '_delete_after_completion' in self.__dict__:
1041            return self.__dict__.get('_delete_after_completion', False)
1042
1043        self._delete_after_completion = self.daemon.properties.get('delete_after_completion', False)
1044        return self._delete_after_completion

Return whether this job is configured to delete itself after completion.

def pprint( *args, detect_password: bool = True, nopretty: bool = False, **kw) -> None:
 10def pprint(
 11    *args,
 12    detect_password: bool = True,
 13    nopretty: bool = False,
 14    **kw
 15) -> None:
 16    """Pretty print an object according to the configured ANSI and UNICODE settings.
 17    If detect_password is True (default), search and replace passwords with '*' characters.
 18    Does not mutate objects.
 19    """
 20    import copy
 21    import json
 22    from meerschaum.utils.packages import attempt_import, import_rich
 23    from meerschaum.utils.formatting import ANSI, get_console, print_tuple
 24    from meerschaum.utils.warnings import error
 25    from meerschaum.utils.misc import replace_password, dict_from_od, filter_keywords
 26    from collections import OrderedDict
 27
 28    if (
 29        len(args) == 1
 30        and
 31        isinstance(args[0], tuple)
 32        and
 33        len(args[0]) == 2
 34        and
 35        isinstance(args[0][0], bool)
 36        and
 37        isinstance(args[0][1], str)
 38    ):
 39        return print_tuple(args[0], **filter_keywords(print_tuple, **kw))
 40
 41    modify = True
 42    rich_pprint = None
 43    if ANSI and not nopretty:
 44        rich = import_rich()
 45        if rich is not None:
 46            rich_pretty = attempt_import('rich.pretty')
 47        if rich_pretty is not None:
 48            def _rich_pprint(*args, **kw):
 49                _console = get_console()
 50                _kw = filter_keywords(_console.print, **kw)
 51                _console.print(*args, **_kw)
 52            rich_pprint = _rich_pprint
 53    elif not nopretty:
 54        pprintpp = attempt_import('pprintpp', warn=False)
 55        try:
 56            _pprint = pprintpp.pprint
 57        except Exception :
 58            import pprint as _pprint_module
 59            _pprint = _pprint_module.pprint
 60
 61    func = (
 62        _pprint if rich_pprint is None else rich_pprint
 63    ) if not nopretty else print
 64
 65    try:
 66        args_copy = copy.deepcopy(args)
 67    except Exception:
 68        args_copy = args
 69        modify = False
 70
 71    _args = []
 72    for a in args:
 73        c = a
 74        ### convert OrderedDict into dict
 75        if isinstance(a, OrderedDict) or issubclass(type(a), OrderedDict):
 76            c = dict_from_od(copy.deepcopy(c))
 77        _args.append(c)
 78    args = _args
 79
 80    _args = list(args)
 81    if detect_password and modify:
 82        _args = []
 83        for a in args:
 84            c = a
 85            if isinstance(c, dict):
 86                c = replace_password(copy.deepcopy(c))
 87            if nopretty:
 88                try:
 89                    c = json.dumps(c)
 90                    is_json = True
 91                except Exception:
 92                    is_json = False
 93                if not is_json:
 94                    try:
 95                        c = str(c)
 96                    except Exception:
 97                        pass
 98            _args.append(c)
 99
100    ### filter out unsupported keywords
101    func_kw = filter_keywords(func, **kw) if not nopretty else {}
102    error_msg = None
103    try:
104        func(*_args, **func_kw)
105    except Exception as e:
106        error_msg = e
107    if error_msg is not None:
108        error(error_msg)

Pretty print an object according to the configured ANSI and UNICODE settings. If detect_password is True (default), search and replace passwords with '*' characters. Does not mutate objects.

def attempt_import( *names: str, lazy: bool = True, warn: bool = True, install: bool = True, venv: Optional[str] = 'mrsm', precheck: bool = True, split: bool = True, check_update: bool = False, check_pypi: bool = False, check_is_installed: bool = True, allow_outside_venv: bool = True, color: bool = True, debug: bool = False) -> Any:
1250def attempt_import(
1251    *names: str,
1252    lazy: bool = True,
1253    warn: bool = True,
1254    install: bool = True,
1255    venv: Optional[str] = 'mrsm',
1256    precheck: bool = True,
1257    split: bool = True,
1258    check_update: bool = False,
1259    check_pypi: bool = False,
1260    check_is_installed: bool = True,
1261    allow_outside_venv: bool = True,
1262    color: bool = True,
1263    debug: bool = False
1264) -> Any:
1265    """
1266    Raise a warning if packages are not installed; otherwise import and return modules.
1267    If `lazy` is `True`, return lazy-imported modules.
1268    
1269    Returns tuple of modules if multiple names are provided, else returns one module.
1270    
1271    Parameters
1272    ----------
1273    names: List[str]
1274        The packages to be imported.
1275
1276    lazy: bool, default True
1277        If `True`, lazily load packages.
1278
1279    warn: bool, default True
1280        If `True`, raise a warning if a package cannot be imported.
1281
1282    install: bool, default True
1283        If `True`, attempt to install a missing package into the designated virtual environment.
1284        If `check_update` is True, install updates if available.
1285
1286    venv: Optional[str], default 'mrsm'
1287        The virtual environment in which to search for packages and to install packages into.
1288
1289    precheck: bool, default True
1290        If `True`, attempt to find module before importing (necessary for checking if modules exist
1291        and retaining lazy imports), otherwise assume lazy is `False`.
1292
1293    split: bool, default True
1294        If `True`, split packages' names on `'.'`.
1295
1296    check_update: bool, default False
1297        If `True` and `install` is `True`, install updates if the required minimum version
1298        does not match.
1299
1300    check_pypi: bool, default False
1301        If `True` and `check_update` is `True`, check PyPI when determining whether
1302        an update is required.
1303
1304    check_is_installed: bool, default True
1305        If `True`, check if the package is contained in the virtual environment.
1306
1307    allow_outside_venv: bool, default True
1308        If `True`, search outside of the specified virtual environment
1309        if the package cannot be found.
1310        Setting to `False` will reinstall the package into a virtual environment, even if it
1311        is installed outside.
1312
1313    color: bool, default True
1314        If `False`, do not print ANSI colors.
1315
1316    Returns
1317    -------
1318    The specified modules. If they're not available and `install` is `True`, it will first
1319    download them into a virtual environment and return the modules.
1320
1321    Examples
1322    --------
1323    >>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
1324    >>> pandas = attempt_import('pandas')
1325
1326    """
1327
1328    import importlib.util
1329
1330    ### to prevent recursion, check if parent Meerschaum package is being imported
1331    if names == ('meerschaum',):
1332        return _import_module('meerschaum')
1333
1334    if venv == 'mrsm' and _import_hook_venv is not None:
1335        if debug:
1336            print(f"Import hook for virtual environment '{_import_hook_venv}' is active.")
1337        venv = _import_hook_venv
1338
1339    _warnings = _import_module('meerschaum.utils.warnings')
1340    warn_function = _warnings.warn
1341
1342    def do_import(_name: str, **kw) -> Union['ModuleType', None]:
1343        with Venv(venv=venv, debug=debug):
1344            ### determine the import method (lazy vs normal)
1345            from meerschaum.utils.misc import filter_keywords
1346            import_method = (
1347                _import_module if not lazy
1348                else lazy_import
1349            )
1350            try:
1351                mod = import_method(_name, **(filter_keywords(import_method, **kw)))
1352            except Exception as e:
1353                if warn:
1354                    import traceback
1355                    traceback.print_exception(type(e), e, e.__traceback__)
1356                    warn_function(
1357                        f"Failed to import module '{_name}'.\nException:\n{e}",
1358                        ImportWarning,
1359                        stacklevel = (5 if lazy else 4),
1360                        color = False,
1361                    )
1362                mod = None
1363        return mod
1364
1365    modules = []
1366    for name in names:
1367        ### Check if package is a declared dependency.
1368        root_name = name.split('.')[0] if split else name
1369        install_name = _import_to_install_name(root_name)
1370
1371        if install_name is None:
1372            install_name = root_name
1373            if warn and root_name != 'plugins':
1374                warn_function(
1375                    f"Package '{root_name}' is not declared in meerschaum.utils.packages.",
1376                    ImportWarning,
1377                    stacklevel = 3,
1378                    color = False
1379                )
1380
1381        ### Determine if the package exists.
1382        if precheck is False:
1383            found_module = (
1384                do_import(
1385                    name, debug=debug, warn=False, venv=venv, color=color,
1386                    check_update=False, check_pypi=False, split=split,
1387                ) is not None
1388            )
1389        else:
1390            if check_is_installed:
1391                with _locks['_is_installed_first_check']:
1392                    if not _is_installed_first_check.get(name, False):
1393                        package_is_installed = is_installed(
1394                            name,
1395                            venv = venv,
1396                            split = split,
1397                            allow_outside_venv = allow_outside_venv,
1398                            debug = debug,
1399                        )
1400                        _is_installed_first_check[name] = package_is_installed
1401                    else:
1402                        package_is_installed = _is_installed_first_check[name]
1403            else:
1404                package_is_installed = _is_installed_first_check.get(
1405                    name,
1406                    venv_contains_package(name, venv=venv, split=split, debug=debug)
1407                )
1408            found_module = package_is_installed
1409
1410        if not found_module:
1411            if install:
1412                if not pip_install(
1413                    install_name,
1414                    venv = venv,
1415                    split = False,
1416                    check_update = check_update,
1417                    color = color,
1418                    debug = debug
1419                ) and warn:
1420                    warn_function(
1421                        f"Failed to install '{install_name}'.",
1422                        ImportWarning,
1423                        stacklevel = 3,
1424                        color = False,
1425                    )
1426            elif warn:
1427                ### Raise a warning if we can't find the package and install = False.
1428                warn_function(
1429                    (f"\n\nMissing package '{name}' from virtual environment '{venv}'; "
1430                     + "some features will not work correctly."
1431                     + "\n\nSet install=True when calling attempt_import.\n"),
1432                    ImportWarning,
1433                    stacklevel = 3,
1434                    color = False,
1435                )
1436
1437        ### Do the import. Will be lazy if lazy=True.
1438        m = do_import(
1439            name, debug=debug, warn=warn, venv=venv, color=color,
1440            check_update=check_update, check_pypi=check_pypi, install=install, split=split,
1441        )
1442        modules.append(m)
1443
1444    modules = tuple(modules)
1445    if len(modules) == 1:
1446        return modules[0]
1447    return modules

Raise a warning if packages are not installed; otherwise import and return modules. If lazy is True, return lazy-imported modules.

Returns tuple of modules if multiple names are provided, else returns one module.

Parameters
  • names (List[str]): The packages to be imported.
  • lazy (bool, default True): If True, lazily load packages.
  • warn (bool, default True): If True, raise a warning if a package cannot be imported.
  • install (bool, default True): If True, attempt to install a missing package into the designated virtual environment. If check_update is True, install updates if available.
  • venv (Optional[str], default 'mrsm'): The virtual environment in which to search for packages and to install packages into.
  • precheck (bool, default True): If True, attempt to find module before importing (necessary for checking if modules exist and retaining lazy imports), otherwise assume lazy is False.
  • split (bool, default True): If True, split packages' names on '.'.
  • check_update (bool, default False): If True and install is True, install updates if the required minimum version does not match.
  • check_pypi (bool, default False): If True and check_update is True, check PyPI when determining whether an update is required.
  • check_is_installed (bool, default True): If True, check if the package is contained in the virtual environment.
  • allow_outside_venv (bool, default True): If True, search outside of the specified virtual environment if the package cannot be found. Setting to False will reinstall the package into a virtual environment, even if it is installed outside.
  • color (bool, default True): If False, do not print ANSI colors.
Returns
  • The specified modules. If they're not available and install is True, it will first
  • download them into a virtual environment and return the modules.
Examples
>>> pandas, sqlalchemy = attempt_import('pandas', 'sqlalchemy')
>>> pandas = attempt_import('pandas')
class Connector:
 22class Connector(metaclass=abc.ABCMeta):
 23    """
 24    The base connector class to hold connection attributes.
 25    """
 26
 27    IS_INSTANCE: bool = False
 28
 29    def __init__(
 30        self,
 31        type: Optional[str] = None,
 32        label: Optional[str] = None,
 33        **kw: Any
 34    ):
 35        """
 36        Set the given keyword arguments as attributes.
 37
 38        Parameters
 39        ----------
 40        type: str
 41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
 42
 43        label: str
 44            The `label` for the connector.
 45
 46
 47        Examples
 48        --------
 49        Run `mrsm edit config` and to edit connectors in the YAML file:
 50
 51        ```yaml
 52        meerschaum:
 53            connections:
 54                {type}:
 55                    {label}:
 56                        ### attributes go here
 57        ```
 58
 59        """
 60        self._original_dict = copy.deepcopy(self.__dict__)
 61        self._set_attributes(type=type, label=label, **kw)
 62
 63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
 64        self.verify_attributes(
 65            ['uri']
 66            if 'uri' in self.__dict__
 67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
 68        )
 69
 70    def _reset_attributes(self):
 71        self.__dict__ = self._original_dict
 72
 73    def _set_attributes(
 74        self,
 75        *args,
 76        inherit_default: bool = True,
 77        **kw: Any
 78    ):
 79        from meerschaum._internal.static import STATIC_CONFIG
 80        from meerschaum.utils.warnings import error
 81
 82        self._attributes = {}
 83
 84        default_label = STATIC_CONFIG['connectors']['default_label']
 85
 86        ### NOTE: Support the legacy method of explicitly passing the type.
 87        label = kw.get('label', None)
 88        if label is None:
 89            if len(args) == 2:
 90                label = args[1]
 91            elif len(args) == 0:
 92                label = None
 93            else:
 94                label = args[0]
 95
 96        if label == 'default':
 97            error(
 98                f"Label cannot be 'default'. Did you mean '{default_label}'?",
 99                InvalidAttributesError,
100            )
101        self.__dict__['label'] = label
102
103        from meerschaum.config import get_config
104        conn_configs = copy.deepcopy(get_config('meerschaum', 'connectors'))
105        connector_config = copy.deepcopy(get_config('system', 'connectors'))
106
107        ### inherit attributes from 'default' if exists
108        if inherit_default:
109            inherit_from = 'default'
110            if self.type in conn_configs and inherit_from in conn_configs[self.type]:
111                _inherit_dict = copy.deepcopy(conn_configs[self.type][inherit_from])
112                self._attributes.update(_inherit_dict)
113
114        ### load user config into self._attributes
115        if self.type in conn_configs and self.label in conn_configs[self.type]:
116            self._attributes.update(conn_configs[self.type][self.label] or {})
117
118        ### load system config into self._sys_config
119        ### (deep copy so future Connectors don't inherit changes)
120        if self.type in connector_config:
121            self._sys_config = copy.deepcopy(connector_config[self.type])
122
123        ### add additional arguments or override configuration
124        self._attributes.update(kw)
125
126        ### finally, update __dict__ with _attributes.
127        self.__dict__.update(self._attributes)
128
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )
175
176
177    def __str__(self):
178        """
179        When cast to a string, return type:label.
180        """
181        return f"{self.type}:{self.label}"
182
183    def __repr__(self):
184        """
185        Represent the connector as type:label.
186        """
187        return str(self)
188
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta
204
205
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type
225
226
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

The base connector class to hold connection attributes.

Connector(type: Optional[str] = None, label: Optional[str] = None, **kw: Any)
29    def __init__(
30        self,
31        type: Optional[str] = None,
32        label: Optional[str] = None,
33        **kw: Any
34    ):
35        """
36        Set the given keyword arguments as attributes.
37
38        Parameters
39        ----------
40        type: str
41            The `type` of the connector (e.g. `sql`, `api`, `plugin`).
42
43        label: str
44            The `label` for the connector.
45
46
47        Examples
48        --------
49        Run `mrsm edit config` and to edit connectors in the YAML file:
50
51        ```yaml
52        meerschaum:
53            connections:
54                {type}:
55                    {label}:
56                        ### attributes go here
57        ```
58
59        """
60        self._original_dict = copy.deepcopy(self.__dict__)
61        self._set_attributes(type=type, label=label, **kw)
62
63        ### NOTE: Override `REQUIRED_ATTRIBUTES` if `uri` is set.
64        self.verify_attributes(
65            ['uri']
66            if 'uri' in self.__dict__
67            else getattr(self, 'REQUIRED_ATTRIBUTES', None)
68        )

Set the given keyword arguments as attributes.

Parameters
  • type (str): The type of the connector (e.g. sql, api, plugin).
  • label (str): The label for the connector.
Examples

Run mrsm edit config and to edit connectors in the YAML file:

meerschaum:
    connections:
        {type}:
            {label}:
                ### attributes go here
IS_INSTANCE: bool = False
def verify_attributes( self, required_attributes: Optional[List[str]] = None, debug: bool = False) -> None:
129    def verify_attributes(
130        self,
131        required_attributes: Optional[List[str]] = None,
132        debug: bool = False,
133    ) -> None:
134        """
135        Ensure that the required attributes have been met.
136        
137        The Connector base class checks the minimum requirements.
138        Child classes may enforce additional requirements.
139
140        Parameters
141        ----------
142        required_attributes: Optional[List[str]], default None
143            Attributes to be verified. If `None`, default to `['label']`.
144
145        debug: bool, default False
146            Verbosity toggle.
147
148        Returns
149        -------
150        Don't return anything.
151
152        Raises
153        ------
154        An error if any of the required attributes are missing.
155        """
156        from meerschaum.utils.warnings import error
157        from meerschaum.utils.misc import items_str
158        if required_attributes is None:
159            required_attributes = ['type', 'label']
160
161        missing_attributes = set()
162        for a in required_attributes:
163            if a not in self.__dict__:
164                missing_attributes.add(a)
165        if len(missing_attributes) > 0:
166            error(
167                (
168                    f"Missing {items_str(list(missing_attributes))} "
169                    + f"for connector '{self.type}:{self.label}'."
170                ),
171                InvalidAttributesError,
172                silent=True,
173                stack=False
174            )

Ensure that the required attributes have been met.

The Connector base class checks the minimum requirements. Child classes may enforce additional requirements.

Parameters
  • required_attributes (Optional[List[str]], default None): Attributes to be verified. If None, default to ['label'].
  • debug (bool, default False): Verbosity toggle.
Returns
  • Don't return anything.
Raises
  • An error if any of the required attributes are missing.
meta: Dict[str, Any]
189    @property
190    def meta(self) -> Dict[str, Any]:
191        """
192        Return the keys needed to reconstruct this Connector.
193        """
194        _meta = {
195            key: value
196            for key, value in self.__dict__.items()
197            if not str(key).startswith('_')
198        }
199        _meta.update({
200            'type': self.type,
201            'label': self.label,
202        })
203        return _meta

Return the keys needed to reconstruct this Connector.

type: str
206    @property
207    def type(self) -> str:
208        """
209        Return the type for this connector.
210        """
211        _type = self.__dict__.get('type', None)
212        if _type is None:
213            import re
214            is_executor = self.__class__.__name__.lower().endswith('executor')
215            suffix_regex = (
216                r'connector$'
217                if not is_executor
218                else r'executor$'
219            )
220            _type = re.sub(suffix_regex, '', self.__class__.__name__.lower())
221            if not _type or _type.lower() == 'instance':
222                raise ValueError("No type could be determined for this connector.")
223            self.__dict__['type'] = _type
224        return _type

Return the type for this connector.

label: str
227    @property
228    def label(self) -> str:
229        """
230        Return the label for this connector.
231        """
232        _label = self.__dict__.get('label', None)
233        if _label is None:
234            from meerschaum._internal.static import STATIC_CONFIG
235            _label = STATIC_CONFIG['connectors']['default_label']
236            self.__dict__['label'] = _label
237        return _label

Return the label for this connector.

class InstanceConnector(meerschaum.Connector):
18class InstanceConnector(Connector):
19    """
20    Instance connectors define the interface for managing pipes and provide methods
21    for management of users, plugins, tokens, and other metadata built atop pipes.
22    """
23
24    IS_INSTANCE: bool = True
25    IS_THREAD_SAFE: bool = False
26
27    from ._users import (
28        get_users_pipe,
29        register_user,
30        get_user_id,
31        get_username,
32        get_users,
33        edit_user,
34        delete_user,
35        get_user_password_hash,
36        get_user_type,
37        get_user_attributes,
38    )
39
40    from ._plugins import (
41        get_plugins_pipe,
42        register_plugin,
43        get_plugin_user_id,
44        delete_plugin,
45        get_plugin_id,
46        get_plugin_version,
47        get_plugins,
48        get_plugin_user_id,
49        get_plugin_username,
50        get_plugin_attributes,
51    )
52
53    from ._tokens import (
54        get_tokens_pipe,
55        register_token,
56        edit_token,
57        invalidate_token,
58        delete_token,
59        get_token,
60        get_tokens,
61        get_token_model,
62        get_token_secret_hash,
63        token_exists,
64        get_token_scopes,
65    )
66
67    from ._pipes import (
68        register_pipe,
69        get_pipe_attributes,
70        get_pipe_id,
71        edit_pipe,
72        delete_pipe,
73        fetch_pipes_keys,
74        pipe_exists,
75        drop_pipe,
76        drop_pipe_indices,
77        sync_pipe,
78        create_pipe_indices,
79        clear_pipe,
80        get_pipe_data,
81        get_pipe_docs,
82        get_sync_time,
83        get_pipe_columns_types,
84        get_pipe_columns_indices,
85        get_pipe_size,
86        compress_pipe,
87        decompress_pipe,
88        vacuum_pipe,
89        analyze_pipe,
90        partition_pipe,
91    )

Instance connectors define the interface for managing pipes and provide methods for management of users, plugins, tokens, and other metadata built atop pipes.

IS_INSTANCE: bool = True
IS_THREAD_SAFE: bool = False
def get_users_pipe(self) -> Pipe:
18def get_users_pipe(self) -> 'mrsm.Pipe':
19    """
20    Return the pipe used for users registration.
21    """
22    if '_users_pipe' in self.__dict__:
23        return self._users_pipe
24
25    cache_connector = self.__dict__.get('_cache_connector', None)
26    self._users_pipe = mrsm.Pipe(
27        'mrsm', 'users',
28        instance=self,
29        target='mrsm_users',
30        temporary=True,
31        cache=True,
32        cache_connector_keys=cache_connector,
33        static=True,
34        null_indices=False,
35        columns={
36            'primary': 'user_id',
37        },
38        dtypes={
39            'user_id': 'uuid',
40            'username': 'string',
41            'password_hash': 'string',
42            'email': 'string',
43            'user_type': 'string',
44            'attributes': 'json',
45        },
46        indices={
47            'unique': 'username',
48        },
49    )
50    return self._users_pipe

Return the pipe used for users registration.

def register_user( self, user: meerschaum.core.User._User.User, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
53def register_user(
54    self,
55    user: User,
56    debug: bool = False,
57    **kwargs: Any
58) -> mrsm.SuccessTuple:
59    """
60    Register a new user to the users pipe.
61    """
62    users_pipe = self.get_users_pipe()
63    user.user_id = uuid.uuid4()
64    sync_success, sync_msg = users_pipe.sync(
65        [{
66            'user_id': user.user_id,
67            'username': user.username,
68            'email': user.email,
69            'password_hash': user.password_hash,
70            'user_type': user.type,
71            'attributes': user.attributes,
72        }],
73        check_existing=False,
74        debug=debug,
75    )
76    if not sync_success:
77        return False, f"Failed to register user '{user.username}':\n{sync_msg}"
78
79    return True, "Success"

Register a new user to the users pipe.

def get_user_id( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
82def get_user_id(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
83    """
84    Return a user's ID from the username.
85    """
86    users_pipe = self.get_users_pipe()
87    result_df = users_pipe.get_data(['user_id'], params={'username': user.username}, limit=1)
88    if result_df is None or len(result_df) == 0:
89        return None
90    return result_df['user_id'][0]

Return a user's ID from the username.

def get_username(self, user_id: Any, debug: bool = False) -> Any:
93def get_username(self, user_id: Any, debug: bool = False) -> Any:
94    """
95    Return the username from the given ID.
96    """
97    users_pipe = self.get_users_pipe()
98    return users_pipe.get_value('username', {'user_id': user_id}, debug=debug)

Return the username from the given ID.

def get_users(self, debug: bool = False, **kw: Any) -> List[str]:
101def get_users(
102    self,
103    debug: bool = False,
104    **kw: Any
105) -> List[str]:
106    """
107    Get the registered usernames.
108    """
109    users_pipe = self.get_users_pipe()
110    df = users_pipe.get_data()
111    if df is None:
112        return []
113
114    return list(df['username'])

Get the registered usernames.

def edit_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
117def edit_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
118    """
119    Edit the attributes for an existing user.
120    """
121    users_pipe = self.get_users_pipe()
122    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
123
124    doc = {'user_id': user_id}
125    if user.email != '':
126        doc['email'] = user.email
127    if user.password_hash != '':
128        doc['password_hash'] = user.password_hash
129    if user.type != '':
130        doc['user_type'] = user.type
131    if user.attributes:
132        doc['attributes'] = user.attributes
133
134    sync_success, sync_msg = users_pipe.sync([doc], debug=debug)
135    if not sync_success:
136        return False, f"Failed to edit user '{user.username}':\n{sync_msg}"
137
138    return True, "Success"

Edit the attributes for an existing user.

def delete_user( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Tuple[bool, str]:
141def delete_user(self, user: User, debug: bool = False) -> mrsm.SuccessTuple:
142    """
143    Delete a user from the users table.
144    """
145    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
146    users_pipe = self.get_users_pipe()
147    clear_success, clear_msg = users_pipe.clear(params={'user_id': user_id}, debug=debug)
148    if not clear_success:
149        return False, f"Failed to delete user '{user}':\n{clear_msg}"
150    return True, "Success"

Delete a user from the users table.

def get_user_password_hash( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[uuid.UUID]:
153def get_user_password_hash(self, user: User, debug: bool = False) -> Union[uuid.UUID, None]:
154    """
155    Get a user's password hash from the users table.
156    """
157    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
158    users_pipe = self.get_users_pipe()
159    result_df = users_pipe.get_data(['password_hash'], params={'user_id': user_id}, debug=debug)
160    if result_df is None or len(result_df) == 0:
161        return None
162
163    return result_df['password_hash'][0]

Get a user's password hash from the users table.

def get_user_type( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[str]:
166def get_user_type(self, user: User, debug: bool = False) -> Union[str, None]:
167    """
168    Get a user's type from the users table.
169    """
170    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
171    users_pipe = self.get_users_pipe()
172    result_df = users_pipe.get_data(['user_type'], params={'user_id': user_id}, debug=debug)
173    if result_df is None or len(result_df) == 0:
174        return None
175
176    return result_df['user_type'][0]

Get a user's type from the users table.

def get_user_attributes( self, user: meerschaum.core.User._User.User, debug: bool = False) -> Optional[Dict[str, Any]]:
179def get_user_attributes(self, user: User, debug: bool = False) -> Union[Dict[str, Any], None]:
180    """
181    Get a user's attributes from the users table.
182    """
183    user_id = user.user_id if user.user_id is not None else self.get_user_id(user, debug=debug)
184    users_pipe = self.get_users_pipe()
185    result_df = users_pipe.get_data(['attributes'], params={'user_id': user_id}, debug=debug)
186    if result_df is None or len(result_df) == 0:
187        return None
188
189    return result_df['attributes'][0]

Get a user's attributes from the users table.

def get_plugins_pipe(self) -> Pipe:
16def get_plugins_pipe(self) -> 'mrsm.Pipe':
17    """
18    Return the internal pipe for syncing plugins metadata.
19    """
20    if '_plugins_pipe' in self.__dict__:
21        return self._plugins_pipe
22
23    cache_connector = self.__dict__.get('_cache_connector', None)
24    users_pipe = self.get_users_pipe()
25    user_id_dtype = users_pipe.dtypes.get('user_id', 'uuid')
26
27    self._plugins_pipe = mrsm.Pipe(
28        'mrsm', 'plugins',
29        instance=self,
30        target='mrsm_plugins',
31        temporary=True,
32        cache=True,
33        cache_connector_keys=cache_connector,
34        static=True,
35        null_indices=False,
36        columns={
37            'primary': 'plugin_name',
38            'user_id': 'user_id',
39        },
40        dtypes={
41            'plugin_name': 'string',
42            'user_id': user_id_dtype,
43            'attributes': 'json',
44            'version': 'string',
45        },
46    )
47    return self._plugins_pipe

Return the internal pipe for syncing plugins metadata.

def register_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
50def register_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
51    """
52    Register a new plugin to the plugins table.
53    """
54    plugins_pipe = self.get_plugins_pipe()
55    users_pipe = self.get_users_pipe()
56    user_id = self.get_plugin_user_id(plugin)
57    if user_id is not None:
58        username = self.get_username(user_id, debug=debug)
59        return False, f"{plugin} is already registered to '{username}'."
60
61    doc = {
62        'plugin_name': plugin.name,
63        'version': plugin.version,
64        'attributes': plugin.attributes,
65        'user_id': plugin.user_id,
66    }
67
68    sync_success, sync_msg = plugins_pipe.sync(
69        [doc],
70        check_existing=False,
71        debug=debug,
72    )
73    if not sync_success:
74        return False, f"Failed to register {plugin}:\n{sync_msg}"
75
76    return True, "Success"

Register a new plugin to the plugins table.

def get_plugin_user_id( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
79def get_plugin_user_id(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
80    """
81    Return the user ID for plugin's owner.
82    """
83    plugins_pipe = self.get_plugins_pipe() 
84    return plugins_pipe.get_value('user_id', {'plugin_name': plugin.name}, debug=debug)

Return the user ID for plugin's owner.

def delete_plugin( self, plugin: Plugin, debug: bool = False) -> Tuple[bool, str]:
105def delete_plugin(self, plugin: Plugin, debug: bool = False) -> mrsm.SuccessTuple:
106    """
107    Delete a plugin's registration.
108    """
109    plugin_id = self.get_plugin_id(plugin, debug=debug)
110    if plugin_id is None:
111        return False, f"{plugin} is not registered."
112    
113    plugins_pipe = self.get_plugins_pipe()
114    clear_success, clear_msg = plugins_pipe.clear(params={'plugin_name': plugin.name}, debug=debug)
115    if not clear_success:
116        return False, f"Failed to delete {plugin}:\n{clear_msg}"
117    return True, "Success"

Delete a plugin's registration.

def get_plugin_id( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
 97def get_plugin_id(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
 98    """
 99    Return a plugin's ID.
100    """
101    user_id = self.get_plugin_user_id(plugin, debug=debug)
102    return plugin.name if user_id is not None else None

Return a plugin's ID.

def get_plugin_version( self, plugin: Plugin, debug: bool = False) -> Optional[str]:
120def get_plugin_version(self, plugin: Plugin, debug: bool = False) -> Union[str, None]:
121    """
122    Return the version for a plugin.
123    """
124    plugins_pipe = self.get_plugins_pipe() 
125    return plugins_pipe.get_value('version', {'plugin_name': plugin.name}, debug=debug)

Return the version for a plugin.

def get_plugins( self, user_id: Optional[int] = None, search_term: Optional[str] = None, debug: bool = False, **kw: Any) -> List[str]:
136def get_plugins(
137    self,
138    user_id: Optional[int] = None,
139    search_term: Optional[str] = None,
140    debug: bool = False,
141    **kw: Any
142) -> List[str]:
143    """
144    Return a list of plugin names.
145    """
146    plugins_pipe = self.get_plugins_pipe()
147    params = {}
148    if user_id:
149        params['user_id'] = user_id
150
151    df = plugins_pipe.get_data(['plugin_name'], params=params, debug=debug)
152    if df is None:
153        return []
154
155    docs = df.to_dict(orient='records')
156    return [
157        plugin_name
158        for doc in docs
159        if (plugin_name := doc['plugin_name']).startswith(search_term or '')
160    ]

Return a list of plugin names.

def get_plugin_username( self, plugin: Plugin, debug: bool = False) -> Optional[uuid.UUID]:
87def get_plugin_username(self, plugin: Plugin, debug: bool = False) -> Union[uuid.UUID, None]:
88    """
89    Return the username for plugin's owner.
90    """
91    user_id = self.get_plugin_user_id(plugin, debug=debug)
92    if user_id is None:
93        return None
94    return self.get_username(user_id, debug=debug)

Return the username for plugin's owner.

def get_plugin_attributes( self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
128def get_plugin_attributes(self, plugin: Plugin, debug: bool = False) -> Dict[str, Any]:
129    """
130    Return the attributes for a plugin.
131    """
132    plugins_pipe = self.get_plugins_pipe() 
133    return plugins_pipe.get_value('attributes', {'plugin_name': plugin.name}, debug=debug) or {}

Return the attributes for a plugin.

def get_tokens_pipe(self) -> Pipe:
22def get_tokens_pipe(self) -> mrsm.Pipe:
23    """
24    Return the internal pipe for tokens management.
25    """
26    if '_tokens_pipe' in self.__dict__:
27        return self._tokens_pipe
28
29    users_pipe = self.get_users_pipe()
30    user_id_dtype = (
31        users_pipe._attributes.get('parameters', {}).get('dtypes', {}).get('user_id', 'uuid')
32    )
33
34    cache_connector = self.__dict__.get('_cache_connector', None)
35
36    self._tokens_pipe = mrsm.Pipe(
37        'mrsm', 'tokens',
38        instance=self,
39        target='mrsm_tokens',
40        temporary=True,
41        cache=True,
42        cache_connector_keys=cache_connector,
43        static=True,
44        autotime=True,
45        null_indices=False,
46        columns={
47            'datetime': 'creation',
48            'primary': 'id',
49        },
50        indices={
51            'unique': 'label',
52            'user_id': 'user_id',
53        },
54        dtypes={
55            'id': 'uuid',
56            'creation': 'datetime',
57            'expiration': 'datetime',
58            'is_valid': 'bool',
59            'label': 'string',
60            'user_id': user_id_dtype,
61            'scopes': 'json',
62            'secret_hash': 'string',
63        },
64    )
65    return self._tokens_pipe

Return the internal pipe for tokens management.

def register_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
68def register_token(
69    self,
70    token: Token,
71    debug: bool = False,
72) -> mrsm.SuccessTuple:
73    """
74    Register the new token to the tokens table.
75    """
76    token_id, token_secret = token.generate_credentials()
77    tokens_pipe = self.get_tokens_pipe()
78    user_id = self.get_user_id(token.user) if token.user is not None else None
79    if user_id is None:
80        return False, "Cannot register a token without a user."
81
82    doc = {
83        'id': token_id,
84        'user_id': user_id,
85        'creation': datetime.now(timezone.utc),
86        'expiration': token.expiration,
87        'label': token.label,
88        'is_valid': token.is_valid,
89        'scopes': list(token.scopes) if token.scopes else [],
90        'secret_hash': hash_password(
91            str(token_secret),
92            rounds=STATIC_CONFIG['tokens']['hash_rounds']
93        ),
94    }
95    sync_success, sync_msg = tokens_pipe.sync([doc], check_existing=False, debug=debug)
96    if not sync_success:
97        return False, f"Failed to register token:\n{sync_msg}"
98    return True, "Success"

Register the new token to the tokens table.

def edit_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
101def edit_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
102    """
103    Persist the token's in-memory state to the tokens pipe.
104    """
105    if not token.id:
106        return False, "Token ID is not set."
107
108    if not token.exists(debug=debug):
109        return False, f"Token {token.id} does not exist."
110
111    if not token.creation:
112        token_model = self.get_token_model(token.id)
113        token.creation = token_model.creation
114
115    tokens_pipe = self.get_tokens_pipe()
116    doc = {
117        'id': token.id,
118        'creation': token.creation,
119        'expiration': token.expiration,
120        'label': token.label,
121        'is_valid': token.is_valid,
122        'scopes': list(token.scopes) if token.scopes else [],
123    }
124    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
125    if not sync_success:
126        return False, f"Failed to edit token '{token.id}':\n{sync_msg}"
127
128    return True, "Success"

Persist the token's in-memory state to the tokens pipe.

def invalidate_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
131def invalidate_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
132    """
133    Set `is_valid` to `False` for the given token.
134    """
135    if not token.id:
136        return False, "Token ID is not set."
137
138    if not token.exists(debug=debug):
139        return False, f"Token {token.id} does not exist."
140
141    if not token.creation:
142        token_model = self.get_token_model(token.id)
143        token.creation = token_model.creation
144
145    token.is_valid = False
146    tokens_pipe = self.get_tokens_pipe()
147    doc = {
148        'id': token.id,
149        'creation': token.creation,
150        'is_valid': False,
151    }
152    sync_success, sync_msg = tokens_pipe.sync([doc], debug=debug)
153    if not sync_success:
154        return False, f"Failed to invalidate token '{token.id}':\n{sync_msg}"
155
156    return True, "Success"

Set is_valid to False for the given token.

def delete_token( self, token: meerschaum.core.Token._Token.Token, debug: bool = False) -> Tuple[bool, str]:
159def delete_token(self, token: Token, debug: bool = False) -> mrsm.SuccessTuple:
160    """
161    Delete the given token from the tokens table.
162    """
163    if not token.id:
164        return False, "Token ID is not set."
165
166    if not token.exists(debug=debug):
167        return False, f"Token {token.id} does not exist."
168
169    if not token.creation:
170        token_model = self.get_token_model(token.id)
171        token.creation = token_model.creation
172
173    token.is_valid = False
174    tokens_pipe = self.get_tokens_pipe()
175    clear_success, clear_msg = tokens_pipe.clear(params={'id': token.id}, debug=debug)
176    if not clear_success:
177        return False, f"Failed to delete token '{token.id}':\n{clear_msg}"
178
179    return True, "Success"

Delete the given token from the tokens table.

def get_token( self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Optional[meerschaum.core.Token._Token.Token]:
235def get_token(self, token_id: Union[uuid.UUID, str], debug: bool = False) -> Union[Token, None]:
236    """
237    Return the `Token` from its ID.
238    """
239    from meerschaum.utils.misc import is_uuid
240    if isinstance(token_id, str):
241        if is_uuid(token_id):
242            token_id = uuid.UUID(token_id)
243        else:
244            raise ValueError("Invalid token ID.")
245    token_model = self.get_token_model(token_id)
246    if token_model is None:
247        return None
248    return Token(**dict(token_model))

Return the Token from its ID.

def get_tokens( self, user: Optional[meerschaum.core.User._User.User] = None, labels: Optional[List[str]] = None, ids: Optional[List[uuid.UUID]] = None, debug: bool = False) -> List[meerschaum.core.Token._Token.Token]:
182def get_tokens(
183    self,
184    user: Optional[User] = None,
185    labels: Optional[List[str]] = None,
186    ids: Optional[List[uuid.UUID]] = None,
187    debug: bool = False,
188) -> List[Token]:
189    """
190    Return a list of `Token` objects.
191    """
192    tokens_pipe = self.get_tokens_pipe()
193    user_id = (
194        self.get_user_id(user, debug=debug)
195        if user is not None
196        else None
197    )
198    user_type = self.get_user_type(user, debug=debug) if user is not None else None
199    params = (
200        {
201            'user_id': (
202                user_id
203                if user_type != 'admin'
204                else [user_id, None]
205            )
206        }
207        if user_id is not None
208        else {}
209    )
210    if labels:
211        params['label'] = labels
212    if ids:
213        params['id'] = ids
214        
215    if debug:
216        dprint(f"Getting tokens with {user_id=}, {params=}")
217
218    tokens_df = tokens_pipe.get_data(params=params, debug=debug)
219    if tokens_df is None:
220        return []
221
222    if debug:
223        dprint(f"Retrieved tokens dataframe:\n{tokens_df}")
224
225    tokens_docs = tokens_df.to_dict(orient='records')
226    return [
227        Token(
228            instance=self,
229            **token_doc
230        )
231        for token_doc in reversed(tokens_docs)
232    ]

Return a list of Token objects.

def get_token_model( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> "'Union[TokenModel, None]'":
251def get_token_model(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> 'Union[TokenModel, None]':
252    """
253    Return a token's model from the instance.
254    """
255    from meerschaum.models import TokenModel
256    if isinstance(token_id, Token):
257        token_id = Token.id
258    if not token_id:
259        raise ValueError("Invalid token ID.")
260    tokens_pipe = self.get_tokens_pipe()
261    doc = tokens_pipe.get_doc(
262        params={'id': token_id},
263        debug=debug,
264    )
265    if doc is None:
266        return None
267    return TokenModel(**doc)

Return a token's model from the instance.

def get_token_secret_hash( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> Optional[str]:
270def get_token_secret_hash(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> Union[str, None]:
271    """
272    Return the secret hash for a given token.
273    """
274    if isinstance(token_id, Token):
275        token_id = token_id.id
276    if not token_id:
277        raise ValueError("Invalid token ID.")
278    tokens_pipe = self.get_tokens_pipe()
279    return tokens_pipe.get_value('secret_hash', params={'id': token_id}, debug=debug)

Return the secret hash for a given token.

def token_exists( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> bool:
308def token_exists(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> bool:
309    """
310    Return `True` if a token exists in the tokens pipe.
311    """
312    if isinstance(token_id, Token):
313        token_id = token_id.id
314    if not token_id:
315        raise ValueError("Invalid token ID.")
316
317    tokens_pipe = self.get_tokens_pipe()
318    return tokens_pipe.get_value('creation', params={'id': token_id}, debug=debug) is not None

Return True if a token exists in the tokens pipe.

def get_token_scopes( self, token_id: Union[uuid.UUID, meerschaum.core.Token._Token.Token], debug: bool = False) -> List[str]:
295def get_token_scopes(self, token_id: Union[uuid.UUID, Token], debug: bool = False) -> List[str]:
296    """
297    Return the scopes for a token.
298    """
299    if isinstance(token_id, Token):
300        token_id = token_id.id
301    if not token_id:
302        raise ValueError("Invalid token ID.")
303
304    tokens_pipe = self.get_tokens_pipe()
305    return tokens_pipe.get_value('scopes', params={'id': token_id}, debug=debug) or []

Return the scopes for a token.

@abc.abstractmethod
def register_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
17@abc.abstractmethod
18def register_pipe(
19    self,
20    pipe: mrsm.Pipe,
21    debug: bool = False,
22    **kwargs: Any
23) -> mrsm.SuccessTuple:
24    """
25    Insert the pipe's attributes into the internal `pipes` table.
26
27    Parameters
28    ----------
29    pipe: mrsm.Pipe
30        The pipe to be registered.
31
32    Returns
33    -------
34    A `SuccessTuple` of the result.
35    """

Insert the pipe's attributes into the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe to be registered.
Returns
@abc.abstractmethod
def get_pipe_attributes( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, Any]:
37@abc.abstractmethod
38def get_pipe_attributes(
39    self,
40    pipe: mrsm.Pipe,
41    debug: bool = False,
42    **kwargs: Any
43) -> Dict[str, Any]:
44    """
45    Return the pipe's document from the internal `pipes` table.
46
47    Parameters
48    ----------
49    pipe: mrsm.Pipe
50        The pipe whose attributes should be retrieved.
51
52    Returns
53    -------
54    The document that matches the keys of the pipe.
55    """

Return the pipe's document from the internal pipes table.

Parameters
  • pipe (mrsm.Pipe): The pipe whose attributes should be retrieved.
Returns
  • The document that matches the keys of the pipe.
@abc.abstractmethod
def get_pipe_id( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Union[str, int, NoneType]:
57@abc.abstractmethod
58def get_pipe_id(
59    self,
60    pipe: mrsm.Pipe,
61    debug: bool = False,
62    **kwargs: Any
63) -> Union[str, int, None]:
64    """
65    Return the `id` for the pipe if it exists.
66
67    Parameters
68    ----------
69    pipe: mrsm.Pipe
70        The pipe whose `id` to fetch.
71
72    Returns
73    -------
74    The `id` for the pipe's document or `None`.
75    """

Return the id for the pipe if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe whose id to fetch.
Returns
  • The id for the pipe's document or None.
def edit_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
77def edit_pipe(
78    self,
79    pipe: mrsm.Pipe,
80    debug: bool = False,
81    **kwargs: Any
82) -> mrsm.SuccessTuple:
83    """
84    Edit the attributes of the pipe.
85
86    Parameters
87    ----------
88    pipe: mrsm.Pipe
89        The pipe whose in-memory parameters must be persisted.
90
91    Returns
92    -------
93    A `SuccessTuple` indicating success.
94    """
95    raise NotImplementedError

Edit the attributes of the pipe.

Parameters
  • pipe (mrsm.Pipe): The pipe whose in-memory parameters must be persisted.
Returns
def delete_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
 97def delete_pipe(
 98    self,
 99    pipe: mrsm.Pipe,
100    debug: bool = False,
101    **kwargs: Any
102) -> mrsm.SuccessTuple:
103    """
104    Delete a pipe's registration from the `pipes` collection.
105
106    Parameters
107    ----------
108    pipe: mrsm.Pipe
109        The pipe to be deleted.
110
111    Returns
112    -------
113    A `SuccessTuple` indicating success.
114    """
115    raise NotImplementedError

Delete a pipe's registration from the pipes collection.

Parameters
  • pipe (mrsm.Pipe): The pipe to be deleted.
Returns
@abc.abstractmethod
def fetch_pipes_keys( self, connector_keys: Optional[List[str]] = None, metric_keys: Optional[List[str]] = None, location_keys: Optional[List[str]] = None, tags: Optional[List[str]] = None, debug: bool = False, **kwargs: Any) -> Union[List[Tuple[str, str, str]], List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]], Dict[Union[int, str], Tuple[str, str, str]], Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]]]:
117@abc.abstractmethod
118def fetch_pipes_keys(
119    self,
120    connector_keys: Optional[List[str]] = None,
121    metric_keys: Optional[List[str]] = None,
122    location_keys: Optional[List[str]] = None,
123    tags: Optional[List[str]] = None,
124    debug: bool = False,
125    **kwargs: Any
126) -> Union[
127    List[Tuple[str, str, str]],
128    List[Tuple[str, str, str, Union[Dict[str, Any], List[str]]]],
129    Dict[Union[int, str], Tuple[str, str, str]],
130    Dict[Union[int, str], Tuple[str, str, str, Union[Dict[str, Any], List[str]]]],
131]:
132    """
133    Return registered pipes' keys according to the provided filters.
134
135    May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples.
136    When returning a dictionary, the key is the pipe's unique ID (int or str).
137    Tuples may be length 3 `(connector_keys, metric_key, location_key)` or length 4
138    with parameters or tags appended as the fourth element.
139
140    Parameters
141    ----------
142    connector_keys: list[str] | None, default None
143        The keys passed via `-c`.
144
145    metric_keys: list[str] | None, default None
146        The keys passed via `-m`.
147
148    location_keys: list[str] | None, default None
149        The keys passed via `-l`.
150
151    tags: List[str] | None, default None
152        Tags passed via `--tags` which are stored under `parameters:tags`.
153
154    Returns
155    -------
156    A list of tuples or a dictionary mapping pipe IDs to tuples.
157    You may return the string `"None"` for location keys in place of nulls.
158
159    Examples
160    --------
161    >>> import meerschaum as mrsm
162    >>> conn = mrsm.get_connector('example:demo')
163    >>>
164    >>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
165    >>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
166    >>> pipe_a.register()
167    >>> pipe_b.register()
168    >>>
169    >>> conn.fetch_pipes_keys(['a', 'b'])
170    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
171    >>> conn.fetch_pipes_keys(metric_keys=['demo'])
172    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
173    >>> conn.fetch_pipes_keys(tags=['foo'])
174    [('a', 'demo', 'None')]
175    >>> conn.fetch_pipes_keys(location_keys=[None])
176    [('a', 'demo', 'None'), ('b', 'demo', 'None')]
177    """

Return registered pipes' keys according to the provided filters.

May return either a list of key tuples or a dictionary mapping pipe IDs to key tuples. When returning a dictionary, the key is the pipe's unique ID (int or str). Tuples may be length 3 (connector_keys, metric_key, location_key) or length 4 with parameters or tags appended as the fourth element.

Parameters
  • connector_keys (list[str] | None, default None): The keys passed via -c.
  • metric_keys (list[str] | None, default None): The keys passed via -m.
  • location_keys (list[str] | None, default None): The keys passed via -l.
  • tags (List[str] | None, default None): Tags passed via --tags which are stored under parameters:tags.
Returns
  • A list of tuples or a dictionary mapping pipe IDs to tuples.
  • You may return the string "None" for location keys in place of nulls.
Examples
>>> import meerschaum as mrsm
>>> conn = mrsm.get_connector('example:demo')
>>>
>>> pipe_a = mrsm.Pipe('a', 'demo', tags=['foo'], instance=conn)
>>> pipe_b = mrsm.Pipe('b', 'demo', tags=['bar'], instance=conn)
>>> pipe_a.register()
>>> pipe_b.register()
>>>
>>> conn.fetch_pipes_keys(['a', 'b'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(metric_keys=['demo'])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
>>> conn.fetch_pipes_keys(tags=['foo'])
[('a', 'demo', 'None')]
>>> conn.fetch_pipes_keys(location_keys=[None])
[('a', 'demo', 'None'), ('b', 'demo', 'None')]
@abc.abstractmethod
def pipe_exists( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> bool:
179@abc.abstractmethod
180def pipe_exists(
181    self,
182    pipe: mrsm.Pipe,
183    debug: bool = False,
184    **kwargs: Any
185) -> bool:
186    """
187    Check whether a pipe's target table exists.
188
189    Parameters
190    ----------
191    pipe: mrsm.Pipe
192        The pipe to check whether its table exists.
193
194    Returns
195    -------
196    A `bool` indicating the table exists.
197    """

Check whether a pipe's target table exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to check whether its table exists.
Returns
  • A bool indicating the table exists.
@abc.abstractmethod
def drop_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
199@abc.abstractmethod
200def drop_pipe(
201    self,
202    pipe: mrsm.Pipe,
203    debug: bool = False,
204    **kwargs: Any
205) -> mrsm.SuccessTuple:
206    """
207    Drop a pipe's collection if it exists.
208
209    Parameters
210    ----------
211    pipe: mrsm.Pipe
212        The pipe to be dropped.
213
214    Returns
215    -------
216    A `SuccessTuple` indicating success.
217    """
218    raise NotImplementedError

Drop a pipe's collection if it exists.

Parameters
  • pipe (mrsm.Pipe): The pipe to be dropped.
Returns
def drop_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
220def drop_pipe_indices(
221    self,
222    pipe: mrsm.Pipe,
223    debug: bool = False,
224    **kwargs: Any
225) -> mrsm.SuccessTuple:
226    """
227    Drop a pipe's indices.
228
229    Parameters
230    ----------
231    pipe: mrsm.Pipe
232        The pipe whose indices need to be dropped.
233
234    Returns
235    -------
236    A `SuccessTuple` indicating success.
237    """
238    return False, f"Cannot drop indices for instance connectors of type '{self.type}'."

Drop a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be dropped.
Returns
@abc.abstractmethod
def sync_pipe( self, pipe: Pipe, df: "'pd.DataFrame'" = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, chunksize: Optional[int] = -1, check_existing: bool = True, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
240@abc.abstractmethod
241def sync_pipe(
242    self,
243    pipe: mrsm.Pipe,
244    df: 'pd.DataFrame' = None,
245    begin: Union[datetime, int, None] = None,
246    end: Union[datetime, int, None] = None,
247    chunksize: Optional[int] = -1,
248    check_existing: bool = True,
249    debug: bool = False,
250    **kwargs: Any
251) -> mrsm.SuccessTuple:
252    """
253    Sync a pipe using a database connection.
254
255    Parameters
256    ----------
257    pipe: mrsm.Pipe
258        The Meerschaum Pipe instance into which to sync the data.
259
260    df: Optional[pd.DataFrame]
261        An optional DataFrame or equivalent to sync into the pipe.
262        Defaults to `None`.
263
264    begin: Union[datetime, int, None], default None
265        Optionally specify the earliest datetime to search for data.
266        Defaults to `None`.
267
268    end: Union[datetime, int, None], default None
269        Optionally specify the latest datetime to search for data.
270        Defaults to `None`.
271
272    chunksize: Optional[int], default -1
273        Specify the number of rows to sync per chunk.
274        If `-1`, resort to system configuration (default is `900`).
275        A `chunksize` of `None` will sync all rows in one transaction.
276        Defaults to `-1`.
277
278    check_existing: bool, default True
279        If `True`, pull and diff with existing data from the pipe. Defaults to `True`.
280
281    debug: bool, default False
282        Verbosity toggle. Defaults to False.
283
284    Returns
285    -------
286    A `SuccessTuple` of success (`bool`) and message (`str`).
287    """

Sync a pipe using a database connection.

Parameters
  • pipe (mrsm.Pipe): The Meerschaum Pipe instance into which to sync the data.
  • df (Optional[pd.DataFrame]): An optional DataFrame or equivalent to sync into the pipe. Defaults to None.
  • begin (Union[datetime, int, None], default None): Optionally specify the earliest datetime to search for data. Defaults to None.
  • end (Union[datetime, int, None], default None): Optionally specify the latest datetime to search for data. Defaults to None.
  • chunksize (Optional[int], default -1): Specify the number of rows to sync per chunk. If -1, resort to system configuration (default is 900). A chunksize of None will sync all rows in one transaction. Defaults to -1.
  • check_existing (bool, default True): If True, pull and diff with existing data from the pipe. Defaults to True.
  • debug (bool, default False): Verbosity toggle. Defaults to False.
Returns
def create_pipe_indices( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
289def create_pipe_indices(
290    self,
291    pipe: mrsm.Pipe,
292    debug: bool = False,
293    **kwargs: Any
294) -> mrsm.SuccessTuple:
295    """
296    Create a pipe's indices.
297
298    Parameters
299    ----------
300    pipe: mrsm.Pipe
301        The pipe whose indices need to be created.
302
303    Returns
304    -------
305    A `SuccessTuple` indicating success.
306    """
307    return False, f"Cannot create indices for instance connectors of type '{self.type}'."

Create a pipe's indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose indices need to be created.
Returns
def clear_pipe( self, pipe: Pipe, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
309def clear_pipe(
310    self,
311    pipe: mrsm.Pipe,
312    begin: Union[datetime, int, None] = None,
313    end: Union[datetime, int, None] = None,
314    params: Optional[Dict[str, Any]] = None,
315    debug: bool = False,
316    **kwargs: Any
317) -> mrsm.SuccessTuple:
318    """
319    Delete rows within `begin`, `end`, and `params`.
320
321    Parameters
322    ----------
323    pipe: mrsm.Pipe
324        The pipe whose rows to clear.
325
326    begin: datetime | int | None, default None
327        If provided, remove rows >= `begin`.
328
329    end: datetime | int | None, default None
330        If provided, remove rows < `end`.
331
332    params: dict[str, Any] | None, default None
333        If provided, only remove rows which match the `params` filter.
334
335    Returns
336    -------
337    A `SuccessTuple` indicating success.
338    """
339    raise NotImplementedError

Delete rows within begin, end, and params.

Parameters
  • pipe (mrsm.Pipe): The pipe whose rows to clear.
  • begin (datetime | int | None, default None): If provided, remove rows >= begin.
  • end (datetime | int | None, default None): If provided, remove rows < end.
  • params (dict[str, Any] | None, default None): If provided, only remove rows which match the params filter.
Returns
def get_pipe_data( self, pipe: Pipe, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> "Union['pd.DataFrame', None]":
341def get_pipe_data(
342    self,
343    pipe: mrsm.Pipe,
344    select_columns: Optional[List[str]] = None,
345    omit_columns: Optional[List[str]] = None,
346    begin: Union[datetime, int, None] = None,
347    end: Union[datetime, int, None] = None,
348    params: Optional[Dict[str, Any]] = None,
349    debug: bool = False,
350    **kwargs: Any
351) -> Union['pd.DataFrame', None]:
352    """
353    Query a pipe's target table and return the DataFrame.
354
355    Parameters
356    ----------
357    pipe: mrsm.Pipe
358        The pipe with the target table from which to read.
359
360    select_columns: list[str] | None, default None
361        If provided, only select these given columns.
362        Otherwise select all available columns (i.e. `SELECT *`).
363
364    omit_columns: list[str] | None, default None
365        If provided, remove these columns from the selection.
366
367    begin: datetime | int | None, default None
368        The earliest `datetime` value to search from (inclusive).
369
370    end: datetime | int | None, default None
371        The lastest `datetime` value to search from (exclusive).
372
373    params: dict[str | str] | None, default None
374        Additional filters to apply to the query.
375
376    Returns
377    -------
378    The target table's data as a DataFrame.
379    """
380    if type(self).get_pipe_docs is get_pipe_docs:
381        raise NotImplementedError(
382            f"Missing `get_pipe_data()` or `get_pipe_docs()` for {type(self)}."
383        )
384
385    docs = self.get_pipe_docs(
386        pipe=pipe,
387        select_columns=select_columns,
388        omit_columns=omit_columns,
389        begin=begin,
390        end=end,
391        params=params,
392        debug=debug,
393        **kwargs
394    )
395    if not docs:
396        return None
397
398    pd = mrsm.attempt_import('pandas')
399    try:
400        return pd.DataFrame(docs)
401    except Exception as e:
402        from meerschaum.utils.warnings import warn
403        warn(f"Cannot build DataFrame from pipe docs:\n{e}")
404    
405    return None

Query a pipe's target table and return the DataFrame.

Parameters
  • pipe (mrsm.Pipe): The pipe with the target table from which to read.
  • select_columns (list[str] | None, default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
  • begin (datetime | int | None, default None): The earliest datetime value to search from (inclusive).
  • end (datetime | int | None, default None): The lastest datetime value to search from (exclusive).
  • params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
  • The target table's data as a DataFrame.
def get_pipe_docs( self, pipe: Pipe, select_columns: Optional[List[str]] = None, omit_columns: Optional[List[str]] = None, begin: Union[datetime.datetime, int, NoneType] = None, end: Union[datetime.datetime, int, NoneType] = None, params: Optional[Dict[str, Any]] = None, debug: bool = False, **kwargs: Any) -> list[dict[str, typing.Any]]:
407def get_pipe_docs(
408    self,
409    pipe: mrsm.Pipe,
410    select_columns: Optional[List[str]] = None,
411    omit_columns: Optional[List[str]] = None,
412    begin: Union[datetime, int, None] = None,
413    end: Union[datetime, int, None] = None,
414    params: Optional[Dict[str, Any]] = None,
415    debug: bool = False,
416    **kwargs: Any
417) -> list[dict[str, Any]]:
418    """
419    Return a pipe's data as a list of documents.
420    Defaults to `get_pipe_data().to_dict(orient='records')`.
421
422    Parameters
423    ----------
424    pipe: mrsm.Pipe
425        The pipe with the target table from which to read.
426
427    select_columns: list[str] | None, default None
428        If provided, only select these given columns.
429        Otherwise select all available columns (i.e. `SELECT *`).
430
431    omit_columns: list[str] | None, default None
432        If provided, remove these columns from the selection.
433
434    begin: datetime | int | None, default None
435        The earliest `datetime` value to search from (inclusive).
436
437    end: datetime | int | None, default None
438        The lastest `datetime` value to search from (exclusive).
439
440    params: dict[str | str] | None, default None
441        Additional filters to apply to the query.
442
443    Returns
444    -------
445    The target table's data as a list of dictionaries.
446    """
447    df = self.get_pipe_data(
448        pipe=pipe,
449        select_columns=select_columns,
450        omit_columns=omit_columns,
451        begin=begin,
452        end=end,
453        params=params,
454        debug=debug,
455        **kwargs
456    )
457    if df is None or df.empty:
458        return []
459    return df.to_dict(orient='records')

Return a pipe's data as a list of documents. Defaults to get_pipe_data().to_dict(orient='records').

Parameters
  • pipe (mrsm.Pipe): The pipe with the target table from which to read.
  • select_columns (list[str] | None, default None): If provided, only select these given columns. Otherwise select all available columns (i.e. SELECT *).
  • omit_columns (list[str] | None, default None): If provided, remove these columns from the selection.
  • begin (datetime | int | None, default None): The earliest datetime value to search from (inclusive).
  • end (datetime | int | None, default None): The lastest datetime value to search from (exclusive).
  • params (dict[str | str] | None, default None): Additional filters to apply to the query.
Returns
  • The target table's data as a list of dictionaries.
@abc.abstractmethod
def get_sync_time( self, pipe: Pipe, params: Optional[Dict[str, Any]] = None, newest: bool = True, debug: bool = False, **kwargs: Any) -> datetime.datetime | int | None:
461@abc.abstractmethod
462def get_sync_time(
463    self,
464    pipe: mrsm.Pipe,
465    params: Optional[Dict[str, Any]] = None,
466    newest: bool = True,
467    debug: bool = False,
468    **kwargs: Any
469) -> datetime | int | None:
470    """
471    Return the most recent value for the `datetime` axis.
472
473    Parameters
474    ----------
475    pipe: mrsm.Pipe
476        The pipe whose collection contains documents.
477
478    params: dict[str, Any] | None, default None
479        Filter certain parameters when determining the sync time.
480
481    newest: bool, default True
482        If `True`, return the maximum value for the column.
483
484    Returns
485    -------
486    The largest `datetime` or `int` value of the `datetime` axis. 
487    """

Return the most recent value for the datetime axis.

Parameters
  • pipe (mrsm.Pipe): The pipe whose collection contains documents.
  • params (dict[str, Any] | None, default None): Filter certain parameters when determining the sync time.
  • newest (bool, default True): If True, return the maximum value for the column.
Returns
  • The largest datetime or int value of the datetime axis.
@abc.abstractmethod
def get_pipe_columns_types( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Dict[str, str]:
489@abc.abstractmethod
490def get_pipe_columns_types(
491    self,
492    pipe: mrsm.Pipe,
493    debug: bool = False,
494    **kwargs: Any
495) -> Dict[str, str]:
496    """
497    Return the data types for the columns in the target table for data type enforcement.
498
499    Parameters
500    ----------
501    pipe: mrsm.Pipe
502        The pipe whose target table contains columns and data types.
503
504    Returns
505    -------
506    A dictionary mapping columns to data types.
507    """

Return the data types for the columns in the target table for data type enforcement.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table contains columns and data types.
Returns
  • A dictionary mapping columns to data types.
def get_pipe_columns_indices(self, debug: bool = False) -> Dict[str, List[Dict[str, str]]]:
509def get_pipe_columns_indices(
510    self,
511    debug: bool = False,
512) -> Dict[str, List[Dict[str, str]]]:
513    """
514    Return a dictionary mapping columns to metadata about related indices.
515
516    Parameters
517    ----------
518    pipe: mrsm.Pipe
519        The pipe whose target table has related indices.
520
521    Returns
522    -------
523    A list of dictionaries with the keys "type" and "name".
524
525    Examples
526    --------
527    >>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
528    >>> pipe.sync([{'color': 'red', 'size': 'M'}])
529    >>> pipe.get_columns_indices()
530    {'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
531    """
532    return {}

Return a dictionary mapping columns to metadata about related indices.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table has related indices.
Returns
  • A list of dictionaries with the keys "type" and "name".
Examples
>>> pipe = mrsm.Pipe('demo', 'shirts', columns={'primary': 'id'}, indices={'size_color': ['color', 'size']})
>>> pipe.sync([{'color': 'red', 'size': 'M'}])
>>> pipe.get_columns_indices()
{'id': [{'name': 'demo_shirts_pkey', 'type': 'PRIMARY KEY'}], 'color': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}], 'size': [{'name': 'IX_demo_shirts_color_size', 'type': 'INDEX'}]}
def get_pipe_size( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Optional[int]:
534def get_pipe_size(
535    self,
536    pipe: mrsm.Pipe,
537    debug: bool = False,
538    **kwargs: Any
539) -> Union[int, None]:
540    """
541    Return the on-disk size of a pipe's target table in bytes.
542
543    Parameters
544    ----------
545    pipe: mrsm.Pipe
546        The pipe whose target table size to measure.
547
548    Returns
549    -------
550    An `int` of the number of bytes occupied by the target table,
551    or `None` if the size cannot be determined.
552    """
553    raise NotImplementedError(
554        f"`get_pipe_size()` is not implemented for instance connectors of type '{self.type}'."
555    )

Return the on-disk size of a pipe's target table in bytes.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table size to measure.
Returns
  • An int of the number of bytes occupied by the target table,
  • or None if the size cannot be determined.
def compress_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
557def compress_pipe(
558    self,
559    pipe: mrsm.Pipe,
560    debug: bool = False,
561    **kwargs: Any
562) -> mrsm.SuccessTuple:
563    """
564    Compress a pipe's target table to reduce disk usage.
565
566    Parameters
567    ----------
568    pipe: mrsm.Pipe
569        The pipe whose target table to compress.
570
571    Returns
572    -------
573    A `SuccessTuple` indicating success.
574    """
575    return False, (
576        f"Compression is not supported for instance connectors of type '{self.type}'."
577    )

Compress a pipe's target table to reduce disk usage.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table to compress.
Returns
def decompress_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
579def decompress_pipe(
580    self,
581    pipe: mrsm.Pipe,
582    debug: bool = False,
583    **kwargs: Any
584) -> mrsm.SuccessTuple:
585    """
586    Decompress a pipe's target table, the inverse of `compress_pipe()`.
587
588    Parameters
589    ----------
590    pipe: mrsm.Pipe
591        The pipe whose target table to decompress.
592
593    Returns
594    -------
595    A `SuccessTuple` indicating success.
596    """
597    return False, (
598        f"Decompression is not supported for instance connectors of type '{self.type}'."
599    )

Decompress a pipe's target table, the inverse of compress_pipe().

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table to decompress.
Returns
def vacuum_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
601def vacuum_pipe(
602    self,
603    pipe: mrsm.Pipe,
604    debug: bool = False,
605    **kwargs: Any
606) -> mrsm.SuccessTuple:
607    """
608    Reclaim disk space from a pipe's target table.
609
610    Parameters
611    ----------
612    pipe: mrsm.Pipe
613        The pipe whose target table to vacuum.
614
615    Returns
616    -------
617    A `SuccessTuple` indicating success.
618    """
619    return False, (
620        f"Vacuuming is not supported for instance connectors of type '{self.type}'."
621    )

Reclaim disk space from a pipe's target table.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table to vacuum.
Returns
def analyze_pipe( self, pipe: Pipe, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
623def analyze_pipe(
624    self,
625    pipe: mrsm.Pipe,
626    debug: bool = False,
627    **kwargs: Any
628) -> mrsm.SuccessTuple:
629    """
630    Refresh the planner statistics for a pipe's target table.
631
632    Parameters
633    ----------
634    pipe: mrsm.Pipe
635        The pipe whose target table to analyze.
636
637    Returns
638    -------
639    A `SuccessTuple` indicating success.
640    """
641    return False, (
642        f"Analyzing is not supported for instance connectors of type '{self.type}'."
643    )

Refresh the planner statistics for a pipe's target table.

Parameters
  • pipe (mrsm.Pipe): The pipe whose target table to analyze.
Returns
def partition_pipe( self, pipe: Pipe, chunk_minutes: Optional[int] = None, debug: bool = False, **kwargs: Any) -> Tuple[bool, str]:
645def partition_pipe(
646    self,
647    pipe: mrsm.Pipe,
648    chunk_minutes: Optional[int] = None,
649    debug: bool = False,
650    **kwargs: Any
651) -> mrsm.SuccessTuple:
652    """
653    Rebuild a pipe's target table to a new partition (chunk) width.
654
655    Parameters
656    ----------
657    pipe: mrsm.Pipe
658        The partitioned pipe whose target table to repartition.
659
660    chunk_minutes: Optional[int], default None
661        The new partition width in minutes. Defaults to the pipe's `verify.chunk_minutes`.
662
663    Returns
664    -------
665    A `SuccessTuple` indicating success.
666    """
667    return False, (
668        f"Repartitioning is not supported for instance connectors of type '{self.type}'."
669    )

Rebuild a pipe's target table to a new partition (chunk) width.

Parameters
  • pipe (mrsm.Pipe): The partitioned pipe whose target table to repartition.
  • chunk_minutes (Optional[int], default None): The new partition width in minutes. Defaults to the pipe's verify.chunk_minutes.
Returns
def make_connector(cls, _is_executor: bool = False):
279def make_connector(cls, _is_executor: bool = False):
280    """
281    Register a class as a `Connector`.
282    The `type` will be the lower case of the class name, without the suffix `connector`.
283
284    Parameters
285    ----------
286    instance: bool, default False
287        If `True`, make this connector type an instance connector.
288        This requires implementing the various pipes functions and lots of testing.
289
290    Examples
291    --------
292    >>> import meerschaum as mrsm
293    >>> from meerschaum.connectors import make_connector, Connector
294    >>> 
295    >>> @make_connector
296    >>> class FooConnector(Connector):
297    ...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
298    ... 
299    >>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
300    >>> print(conn.username, conn.password)
301    dog cat
302    >>> 
303    """
304    import re
305    from meerschaum.plugins import _get_parent_plugin
306    suffix_regex = (
307        r'connector$'
308        if not _is_executor
309        else r'executor$'
310    )
311    plugin_name = _get_parent_plugin(2)
312    typ = re.sub(suffix_regex, '', cls.__name__.lower())
313    with _locks['types']:
314        types[typ] = cls
315    with _locks['custom_types']:
316        custom_types.add(typ)
317    if plugin_name:
318        with _locks['plugins_types']:
319            if plugin_name not in plugins_types:
320                plugins_types[plugin_name] = []
321            plugins_types[plugin_name].append(typ)
322    with _locks['connectors']:
323        if typ not in connectors:
324            connectors[typ] = {}
325    if getattr(cls, 'IS_INSTANCE', False):
326        with _locks['instance_types']:
327            if typ not in instance_types:
328                instance_types.append(typ)
329
330    return cls

Register a class as a Connector. The type will be the lower case of the class name, without the suffix connector.

Parameters
  • instance (bool, default False): If True, make this connector type an instance connector. This requires implementing the various pipes functions and lots of testing.
Examples
>>> import meerschaum as mrsm
>>> from meerschaum.connectors import make_connector, Connector
>>> 
>>> @make_connector
>>> class FooConnector(Connector):
...     REQUIRED_ATTRIBUTES: list[str] = ['username', 'password']
... 
>>> conn = mrsm.get_connector('foo:bar', username='dog', password='cat')
>>> print(conn.username, conn.password)
dog cat
>>>
def entry( sysargs: Union[List[str], str, NoneType] = None, _patch_args: Optional[Dict[str, Any]] = None, _use_cli_daemon: bool = True, _session_id: Optional[str] = None) -> Tuple[bool, str]:
53def entry(
54    sysargs: Union[List[str], str, None] = None,
55    _patch_args: Optional[Dict[str, Any]] = None,
56    _use_cli_daemon: bool = True,
57    _session_id: Optional[str] = None,
58) -> SuccessTuple:
59    """
60    Parse arguments and launch a Meerschaum action.
61
62    Returns
63    -------
64    A `SuccessTuple` indicating success.
65    """
66    start = time.perf_counter()
67    from meerschaum.config.environment import get_daemon_env_vars
68    sysargs_list = shlex.split(sysargs) if isinstance(sysargs, str) else sysargs
69    if (
70        not _use_cli_daemon
71        or (not sysargs or (sysargs[0] and sysargs[0].startswith('-')))
72        or '--no-daemon' in sysargs_list
73        or '--daemon' in sysargs_list
74        or '-d' in sysargs_list
75        or get_daemon_env_vars()
76        or not mrsm.get_config('system', 'experimental', 'cli_daemon')
77    ):
78        success, msg = entry_without_daemon(sysargs, _patch_args=_patch_args)
79        end = time.perf_counter()
80        if '--debug' in sysargs_list:
81            print(f"Duration without daemon: {round(end - start, 3)}")
82        return success, msg
83
84    from meerschaum._internal.cli.entry import entry_with_daemon
85    success, msg = entry_with_daemon(sysargs, _patch_args=_patch_args)
86    end = time.perf_counter()
87    if '--debug' in sysargs_list:
88        print(f"Duration with daemon: {round(end - start, 3)}")
89    return success, msg

Parse arguments and launch a Meerschaum action.

Returns