pyglimer.database#

pyglimer.database.stations#

Database management and overview for the PyGLImER database.

copyright:

The PyGLImER development team (makus@gfz-potsdam.de).

license:

GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)

author:

Peter Makus (makus@gfz-potsdam.de)

Created: Friday, 12th February 2020 03:24:30 pm Last Modified: Tuesday, 25th October 2022 03:50:51 pm

pyglimer.database.stations.redownload_missing_statxmls(clients, phase, statloc, rawdir, verbose=True)[source]#

Fairly primitive program that walks through existing raw waveforms in rawdir and looks for missing station xmls in statloc and, afterwards, attemps a redownload

Parameters:
  • clients (list) – List of clients (see obspy documentation for ~obspy.Client).

  • phase (str) – Either “P” or “S”, defines in which folder to look for mseeds.

  • statloc (str) – Folder, in which the station xmls are saved

  • rawdir (str) – Uppermost directory, in which the raw mseeds are saved (i.e. the directory above the phase division).

  • verbose (bool, optional) – Show some extra information, by default True

pyglimer.database.raw#

HDF5 based data format to save raw waveforms and station XMLs. Works similar to the data format saving receiver functions.

copyright:

The PyGLImER development team (makus@gfz-potsdam.de).

license:

GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)

author:

Peter Makus (makus@gfz-potsdam.de)

Created: Tuesday, 6th September 2022 10:37:12 am Last Modified: Thursday, 27th October 2022 03:07:39 pm

class pyglimer.database.raw.DBHandler(path, mode, compression)[source]#

Bases: File

The actual file handler of the hdf5 receiver function files.

Warning

Should not be accessed directly. Access :class:`~pyglimer.database.raw.RawDataBase` instead.

Child object of h5py.File and inherets all its attributes and functions in addition to functions that are particularly useful for noise correlations.

Create a new file object.

See the h5py user guide for a detailed explanation of the options.

name

Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.

mode

r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise

driver

Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.

libver

Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.

userblock_size

Desired size of user block. Only allowed when creating a new file (mode w, w- or x).

swmr

Open the file in SWMR read mode. Only used when mode = ‘r’.

rdcc_nbytes

Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1 MB) per dataset.

rdcc_w0

The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.

rdcc_nslots

The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.

track_order

Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.

fs_strategy

The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.

fs_page_size

File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).

fs_persist

A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.

fs_threshold

The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.

page_buf_size

Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.

min_meta_keep

Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.

min_raw_keep

Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.

locking

The file locking behavior. Defined as: False (or “false”) Disable file locking True (or “true”) Enable file locking “best-effort” Enable file locking but ignore some errors None Use HDF5 defaults Warning: The HDF5_USE_FILE_LOCKING environment variable can override this parameter. Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.

alignment_threshold

Together with alignment_interval, this property ensures that any file object greater than or equal in size to the alignement threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.

alignment_interval

This property should be used in conjunction with alignment_threshold. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT

Additional keywords

Passed on to the selected file driver.

add_response(inv: Inventory, tag: str = 'response')[source]#

Add the response information (i.e., stationxml).

Parameters:
  • inv (Inventory) – inventory containing information of only one station

  • tag (str, optional) – tag to save under, defaults to ‘response’

Note

The Inventory inv should only contain information of one station.

add_waveform(data: Union[Trace, Stream], evt_id: Union[UTCDateTime, str], tag: str = 'raw')[source]#

Add receiver function to the hdf5 file. The data can later be accessed using the get_data() method.

Parameters:
  • data (Trace or Stream) – Data to save.

  • evt_id – Event identifier. Defined by the origin time of the event rounded using utc_save_str()

  • tag – The tag that the data should be saved under. Defaults to ‘raw’

Raises:

TypeError – for wrong data type.

content_dict()[source]#

Returns a dictionary with the following structure. Not actively used.

dict
├── raw
│   └── YP
│       └── NE11
│           ├── 2009261T120020.8
│           │   ├── BHE (1500)
│           │   ├── BHN (1500)
│           │   └── BHZ (1500)
:           :
│           └── 2009261T184125.8
│               ├── BHE (1500)
│               ├── BHN (1500)
│               └── BHZ (1500)
└── response
    └── YP
        └── NE11 (27156)
get_data(network: str, station: str, evt_id: UTCDateTime, tag: str = 'raw') Stream[source]#

Returns an obspy Stream holding the requested data and all existing channels.

Note

Wildcards are allowed for all parameters.

Parameters:
  • network (str) – network code, e.g., IU

  • station (str) – station code, e.g., HRV

  • evt_id (UTCDateTime or str) – Origin Time of the corresponding event

  • tag (str, optional) – Data tag (e.g., ‘raw’). Defaults to raw.

Returns:

a Stream holding the requested data.

Return type:

Stream

get_response(network: str, station: str, tag: str = 'response') Inventory[source]#

Get the Response information for the queried station.

Parameters:
  • network (str) – Network Code

  • station (str) – Station Code

  • tag (str, optional) – Tag under which the response is saved, defaults to ‘response’

Returns:

Obspy Inventory

Return type:

Inventory

print_tree()[source]#

print a tree with the following structure

dict
├── raw
│   └── YP
│       └── NE11
│           ├── 2009261T120020.8
│           │   ├── BHE (1500)
│           │   ├── BHN (1500)
│           │   └── BHZ (1500)
:           :
│           └── 2009261T184125.8
│               ├── BHE (1500)
│               ├── BHN (1500)
│               └── BHZ (1500)
└── response
    └── YP
        └── NE11 (27156)
walk(tag: str, network: str, station: str) Iterable[Stream][source]#

Iterate over all Streams with the given properties. (i.e, all events)

Parameters:
  • tag (str) – data tag

  • network (str) – Network code

  • station (str) – Statio ncode

Returns:

Iterator

Return type:

Iterable[RFTrace]

Yield:

one Stream per event.

Return type:

Iterator[Iterable[Stream]]

Note

Does not accept wildcards.

class pyglimer.database.raw.RawDatabase(path: str, mode: str = 'a', compression: str = 'gzip3')[source]#

Bases: object

Base class to handle the hdf5 files that contain raw data for receiver function computaiton.

Access an hdf5 file holding raw waveforms. The resulting file can be accessed using all functionalities of h5py (for example as a dict).

Parameters:
  • path (str) – Full path to the file

  • mode (str, optional) – Mode to access the file. Options are: ‘a’ for all, ‘w’ for write, ‘r+’ for writing in an already existing file, or ‘r’ for read-only , defaults to ‘a’.

  • compression (str, optional) – The compression algorithm and compression level that the arrays should be saved with. ‘gzip3’ tends to perform well, else you could choose ‘gzipx’ where x is a digit between 1 and 9 (i.e., 9 is the highest compression) or None for fastest perfomance, defaults to ‘gzip3’.

Warning

Access only through a context manager (see below):

>>> with RawDataBase('myfile.h5') as rdb:
>>>     type(rdb)  # This is a DBHandler
<class 'pyglimer.database.rfh5.DBHandler'>

Example:

>>> with RawDataBase(
            '/path/to/db/XN.NEP06.h5') as rfdb:
>>>     # find the available tags for existing db
>>>     print(list(rfdb.keys()))
['raw', 'weird']
>>>     # Get Data from all times and
>>>     st = rfdb.get_data(
>>>         'XN', 'NEP06', '*')
>>> print(st.count())
250
pyglimer.database.raw.all_traces_recursive(group: Group, stream: Stream, pattern: str) Stream[source]#

Recursively, appends all traces in a h5py group to the input stream. In addition this will check whether the data matches a certain pattern.

Parameters:
  • group (class:h5py._hl.group.Group) – group to search through

  • stream (Stream) – Stream to append the traces to

  • pattern (str) – pattern for the path in the hdf5 file, see fnmatch for details.

Returns:

Stream with appended traces

Return type:

Stream

pyglimer.database.raw.convert_header_to_hdf5(dataset: Dataset, header: Stats)[source]#

Convert an Stats object and adds it to the provided hdf5 dataset.

Parameters:
  • dataset (h5py.Dataset) – the dataset that the header should be added to

  • header (Stats) – The trace’s header

pyglimer.database.raw.h5_dict(val)[source]#

Recursive function to red the keys of a dataset.

pyglimer.database.raw.h5_tree(val, pre='')[source]#

Recursive function to print the structure of an ASDF dataset.

pyglimer.database.raw.mseed_to_hdf5(rawfolder: str, save_statxml: bool, statloc: Optional[str] = None)[source]#

Convert a given mseed database in rawfolder to an h5 based database.

Parameters:
  • rawfolder (os.PathLike) – directory in which the fodlers for each event are stored

  • save_statxml (bool) – should stationxmls be read as well? If so, you need to provide statloc as well.

  • statloc (str, optional) – location of station xmls only needed if save_statxml is is set to True, defaults to None

pyglimer.database.raw.read_hdf5_header(dataset: Dataset) Stats[source]#

Takes an hdf5 dataset as input and returns the header of the Trace.

Parameters:

dataset (h5py.Dataset) – The dataset to be read from

Returns:

The trace’s header

Return type:

Stats

pyglimer.database.raw.save_raw_DB_single_station(network: str, station: str, saved: dict, st: Stream, rawloc: str, inv: Inventory)[source]#

A variation of the above function that will open the ASDF file once and write all traces and then close it afterwards Save the raw waveform data in the desired format. The point of this function is mainly that the waveforms will be saved with the correct associations and at the correct locations.

W are specifically writing event by event streams, so that we don’t loose parts that are already downloaded!

Parameters:
  • saved (dict) – Dictionary holding information about the original streams to identify them afterwards.

  • st (Stream) – obspy stream holding all data (from various stations)

  • rawloc (str) – Parental directory (with phase) to save the files in.

  • inv (Inventory) – The inventory holding all the station information

pyglimer.database.raw.statxml_to_hdf5(rawfolder: str, statloc: str)[source]#

Write a statxml database to h5 files

Parameters:
  • rawfolder (stroros.PathLike) – location that contains the h5 files

  • statloc (stroros.PathLike) – location where the stationxmls are stored

pyglimer.database.raw.write_st(st: Stream, event: Event, outfolder: str, statxml: Inventory, resample: bool = True)[source]#

Write raw waveform data to an asdf file. This includes the corresponding (teleseismic) event and the station inventory (i.e., response information).

Parameters:
  • st (Stream) – The stream holding the raw waveform data.

  • event (Event) – The seismic event associated to the recorded data.

  • outfolder (str) – Output folder to write the asdf file to.

  • statxml (Inventory) – The station inventory

  • resample (bool, optional) – Resample the data to 10Hz sampling rate? Defaults to True.

pyglimer.database.raw.write_st_to_ds(ds: DBHandler, st: Stream, evt_id: str, resample: bool = True)[source]#

Write raw waveform data to an asdf file. This includes the corresponding (teleseismic) event and the station inventory (i.e., response information).

Parameters:
  • st (Stream) – The stream holding the raw waveform data.

  • event (Event) – The seismic event associated to the recorded data.

  • outfolder (str) – Output folder to write the asdf file to.

  • statxml (Inventory) – The station inventory

  • resample (bool, optional) – Resample the data to 10Hz sampling rate? Defaults to True.

pyglimer.database.rfh5#

copyright:

The PyGLImER development team (makus@gfz-potsdam.de).

license:

GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)

author:

Peter Makus (makus@gfz-potsdam.de)

Created: Wednesday, 11th August 2021 03:20:09 pm Last Modified: Friday, 16th September 2022 02:47:40 pm

class pyglimer.database.rfh5.DBHandler(path, mode, compression)[source]#

Bases: File

The actual file handler of the hdf5 receiver function files.

Warning

Should not be accessed directly. Access :class:`~pyglimer.database.rfh5.RFDataBase` instead.

Child object of h5py.File and inherets all its attributes and functions in addition to functions that are particularly useful for receiver functions.

Create a new file object.

See the h5py user guide for a detailed explanation of the options.

name

Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.

mode

r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise

driver

Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.

libver

Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.

userblock_size

Desired size of user block. Only allowed when creating a new file (mode w, w- or x).

swmr

Open the file in SWMR read mode. Only used when mode = ‘r’.

rdcc_nbytes

Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1 MB) per dataset.

rdcc_w0

The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.

rdcc_nslots

The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.

track_order

Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.

fs_strategy

The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.

fs_page_size

File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).

fs_persist

A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.

fs_threshold

The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.

page_buf_size

Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.

min_meta_keep

Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.

min_raw_keep

Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.

locking

The file locking behavior. Defined as: False (or “false”) Disable file locking True (or “true”) Enable file locking “best-effort” Enable file locking but ignore some errors None Use HDF5 defaults Warning: The HDF5_USE_FILE_LOCKING environment variable can override this parameter. Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.

alignment_threshold

Together with alignment_interval, this property ensures that any file object greater than or equal in size to the alignement threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.

alignment_interval

This property should be used in conjunction with alignment_threshold. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT

Additional keywords

Passed on to the selected file driver.

add_rf(data: RFTrace, tag: str = 'rf')[source]#

Add receiver function to the hdf5 file. The data can later be accessed using the get_data() method.

Parameters:
  • data (RFTrace or RFStream) – Data to save. Either a RFTrace object or a RFStream holding one or several traces.

  • tag – The tag that the data should be saved under. By convention, receiver functions are saved with the tag ‘rf’.

Raises:

TypeError – for wrong data type.

get_coords(network: str, station: str, phase: Optional[str] = None, tag: str = 'rf') Tuple[float, float, float][source]#

Return the coordinates of the station.

Parameters:
  • network (str) – Network Code.

  • station (str) – Station Code

  • phase (str, optional) – Teleseismic Phase, defaults to None

Returns:

Latitude (dec deg), Longitude (dec deg), Elevation (m)

Return type:

Tuple[ float, float, float]

get_data(network: str, station: str, phase: str, evt_time: UTCDateTime, tag: str = 'rf', pol: str = 'v') RFStream[source]#

Returns a RFStream holding all the requested data.

Note

Wildcards are allowed for all parameters.

Parameters:
  • network (str) – network code, e.g., IU

  • station (str) – station code, e.g., HRV

  • phase (str) – Teleseismic phase

  • evt_time (UTCDateTime, optional) – Origin Time of the Event

  • tag (str, optional) – Data tag (e.g., ‘rf’). Defaults to rf.

  • pol (str, optional) – RF Polarisation. Defaults to v.

Returns:

a RFStream holding the requested data.

Return type:

RFStream

walk(tag: str, network: str, station: str, phase: str, pol: str = 'v') Iterable[RFTrace][source]#

Iterate over all receiver functions with the given properties.

Parameters:
  • tag (str) – data tag

  • network (str) – Network code

  • station (str) – Statio ncode

  • phase (str) – Teleseismic phase

  • pol (str, optional) – RF-Polarisation, defaults to ‘v’

Returns:

Iterator

Return type:

Iterable[RFTrace]

Yield:

one RFTrace per receiver function.

Return type:

Iterator[Iterable[RFTrace]]

Note

Does not accept wildcards.

class pyglimer.database.rfh5.RFDataBase(path: str, mode: str = 'a', compression: str = 'gzip3')[source]#

Bases: object

Base class to handle the hdf5 files that contain receiver functions.

Access an hdf5 file holding receiver functions. The resulting file can be accessed using all functionalities of h5py (for example as a dict).

Parameters:
  • path (str) – Full path to the file

  • mode (str, optional) – Mode to access the file. Options are: ‘a’ for all, ‘w’ for write, ‘r+’ for writing in an already existing file, or ‘r’ for read-only , defaults to ‘a’.

  • compression (str, optional) – The compression algorithm and compression level that the arrays should be saved with. ‘gzip3’ tends to perform well, else you could choose ‘gzipx’ where x is a digit between 1 and 9 (i.e., 9 is the highest compression) or None for fastest perfomance, defaults to ‘gzip3’.

Warning

Access only through a context manager (see below):

>>> with RFDataBase('myfile.h5') as rfdb:
>>>     type(rfdb)  # This is a DBHandler
<class 'pyglimer.database.rfh5.DBHandler'>

Example:

>>> with RFDataBase(
            '/path/to/db/XN.NEP06.h5') as rfdb:
>>>     # find the available tags for existing db
>>>     print(list(rfdb.keys()))
['rf', 'rfstack']
>>>     # Get Data from all times and tag rf, phase P
>>>     st = rfdb.get_data(
>>>         'XN', 'NEP06', 'P', '*', 'rf')
>>> print(st.count())
250
pyglimer.database.rfh5.all_traces_recursive(group: Group, stream: RFStream, pattern: str) RFStream[source]#

Recursively, appends all traces in a h5py group to the input stream. In addition this will check whether the data matches a certain pattern.

Parameters:
  • group (class:h5py._hl.group.Group) – group to search through

  • stream (Stream) – Stream to append the traces to

  • pattern (str) – pattern for the path in the hdf5 file, see fnmatch for details.

Returns:

Stream with appended traces

Return type:

Stream

pyglimer.database.rfh5.convert_header_to_hdf5(dataset: Dataset, header: Stats)[source]#

Convert an Stats object and adds it to the provided hdf5 dataset.

Parameters:
  • dataset (h5py.Dataset) – the dataset that the header should be added to

  • header (Stats) – The trace’s header

pyglimer.database.rfh5.read_hdf5_header(dataset: Dataset) Stats[source]#

Takes an hdf5 dataset as input and returns the header of the Trace.

Parameters:

dataset (h5py.Dataset) – The dataset to be read from

Returns:

The trace’s header

Return type:

Stats