pyglimer.database#
pyglimer.database.stations#
Database management and overview for the PyGLImER database.
- copyright:
The PyGLImER development team (makus@gfz-potsdam.de).
- license:
GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)
- author:
Peter Makus (makus@gfz-potsdam.de)
Created: Friday, 12th February 2020 03:24:30 pm Last Modified: Tuesday, 25th October 2022 03:50:51 pm
- pyglimer.database.stations.redownload_missing_statxmls(clients, phase, statloc, rawdir, verbose=True)[source]#
Fairly primitive program that walks through existing raw waveforms in rawdir and looks for missing station xmls in statloc and, afterwards, attemps a redownload
- Parameters:
clients (list) – List of clients (see obspy documentation for ~obspy.Client).
phase (str) – Either “P” or “S”, defines in which folder to look for mseeds.
statloc (str) – Folder, in which the station xmls are saved
rawdir (str) – Uppermost directory, in which the raw mseeds are saved (i.e. the directory above the phase division).
verbose (bool, optional) – Show some extra information, by default True
pyglimer.database.raw#
HDF5 based data format to save raw waveforms and station XMLs. Works similar to the data format saving receiver functions.
- copyright:
The PyGLImER development team (makus@gfz-potsdam.de).
- license:
GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)
- author:
Peter Makus (makus@gfz-potsdam.de)
Created: Tuesday, 6th September 2022 10:37:12 am Last Modified: Thursday, 27th October 2022 03:07:39 pm
- class pyglimer.database.raw.DBHandler(path, mode, compression)[source]#
Bases:
File
The actual file handler of the hdf5 receiver function files.
Warning
Should not be accessed directly. Access :class:`~pyglimer.database.raw.RawDataBase` instead.
Child object of
h5py.File
and inherets all its attributes and functions in addition to functions that are particularly useful for noise correlations.Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.
- userblock_size
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
- rdcc_nbytes
Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1 MB) per dataset.
- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.
- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- fs_strategy
The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.
- fs_page_size
File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).
- fs_persist
A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.
- fs_threshold
The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.
- page_buf_size
Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.
- min_meta_keep
Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- min_raw_keep
Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- locking
The file locking behavior. Defined as: False (or “false”) Disable file locking True (or “true”) Enable file locking “best-effort” Enable file locking but ignore some errors None Use HDF5 defaults Warning: The HDF5_USE_FILE_LOCKING environment variable can override this parameter. Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.
- alignment_threshold
Together with
alignment_interval
, this property ensures that any file object greater than or equal in size to the alignement threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.- alignment_interval
This property should be used in conjunction with
alignment_threshold
. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT- Additional keywords
Passed on to the selected file driver.
- add_response(inv: Inventory, tag: str = 'response')[source]#
Add the response information (i.e., stationxml).
- Parameters:
inv (Inventory) – inventory containing information of only one station
tag (str, optional) – tag to save under, defaults to ‘response’
Note
The Inventory inv should only contain information of one station.
- add_waveform(data: Union[Trace, Stream], evt_id: Union[UTCDateTime, str], tag: str = 'raw')[source]#
Add receiver function to the hdf5 file. The data can later be accessed using the
get_data()
method.- Parameters:
data (Trace or Stream) – Data to save.
evt_id – Event identifier. Defined by the origin time of the event rounded using
utc_save_str()
tag – The tag that the data should be saved under. Defaults to ‘raw’
- Raises:
TypeError – for wrong data type.
- content_dict()[source]#
Returns a dictionary with the following structure. Not actively used.
dict ├── raw │ └── YP │ └── NE11 │ ├── 2009261T120020.8 │ │ ├── BHE (1500) │ │ ├── BHN (1500) │ │ └── BHZ (1500) : : │ └── 2009261T184125.8 │ ├── BHE (1500) │ ├── BHN (1500) │ └── BHZ (1500) └── response └── YP └── NE11 (27156)
- get_data(network: str, station: str, evt_id: UTCDateTime, tag: str = 'raw') Stream [source]#
Returns an obspy Stream holding the requested data and all existing channels.
Note
Wildcards are allowed for all parameters.
- Parameters:
network (str) – network code, e.g., IU
station (str) – station code, e.g., HRV
evt_id (UTCDateTime or str) – Origin Time of the corresponding event
tag (str, optional) – Data tag (e.g., ‘raw’). Defaults to raw.
- Returns:
a
Stream
holding the requested data.- Return type:
Stream
- get_response(network: str, station: str, tag: str = 'response') Inventory [source]#
Get the Response information for the queried station.
- Parameters:
network (str) – Network Code
station (str) – Station Code
tag (str, optional) – Tag under which the response is saved, defaults to ‘response’
- Returns:
Obspy Inventory
- Return type:
Inventory
- print_tree()[source]#
print a tree with the following structure
dict ├── raw │ └── YP │ └── NE11 │ ├── 2009261T120020.8 │ │ ├── BHE (1500) │ │ ├── BHN (1500) │ │ └── BHZ (1500) : : │ └── 2009261T184125.8 │ ├── BHE (1500) │ ├── BHN (1500) │ └── BHZ (1500) └── response └── YP └── NE11 (27156)
- walk(tag: str, network: str, station: str) Iterable[Stream] [source]#
Iterate over all Streams with the given properties. (i.e, all events)
- Parameters:
tag (str) – data tag
network (str) – Network code
station (str) – Statio ncode
- Returns:
Iterator
- Return type:
Iterable[RFTrace]
- Yield:
one Stream per event.
- Return type:
Iterator[Iterable[Stream]]
Note
Does not accept wildcards.
- class pyglimer.database.raw.RawDatabase(path: str, mode: str = 'a', compression: str = 'gzip3')[source]#
Bases:
object
Base class to handle the hdf5 files that contain raw data for receiver function computaiton.
Access an hdf5 file holding raw waveforms. The resulting file can be accessed using all functionalities of h5py (for example as a dict).
- Parameters:
path (str) – Full path to the file
mode (str, optional) – Mode to access the file. Options are: ‘a’ for all, ‘w’ for write, ‘r+’ for writing in an already existing file, or ‘r’ for read-only , defaults to ‘a’.
compression (str, optional) – The compression algorithm and compression level that the arrays should be saved with. ‘gzip3’ tends to perform well, else you could choose ‘gzipx’ where x is a digit between 1 and 9 (i.e., 9 is the highest compression) or None for fastest perfomance, defaults to ‘gzip3’.
Warning
Access only through a context manager (see below):
>>> with RawDataBase('myfile.h5') as rdb: >>> type(rdb) # This is a DBHandler <class 'pyglimer.database.rfh5.DBHandler'>
Example:
>>> with RawDataBase( '/path/to/db/XN.NEP06.h5') as rfdb: >>> # find the available tags for existing db >>> print(list(rfdb.keys())) ['raw', 'weird'] >>> # Get Data from all times and >>> st = rfdb.get_data( >>> 'XN', 'NEP06', '*') >>> print(st.count()) 250
- pyglimer.database.raw.all_traces_recursive(group: Group, stream: Stream, pattern: str) Stream [source]#
Recursively, appends all traces in a h5py group to the input stream. In addition this will check whether the data matches a certain pattern.
- Parameters:
group (class:h5py._hl.group.Group) – group to search through
stream (Stream) – Stream to append the traces to
pattern (str) – pattern for the path in the hdf5 file, see fnmatch for details.
- Returns:
Stream with appended traces
- Return type:
Stream
- pyglimer.database.raw.convert_header_to_hdf5(dataset: Dataset, header: Stats)[source]#
Convert an
Stats
object and adds it to the provided hdf5 dataset.- Parameters:
dataset (h5py.Dataset) – the dataset that the header should be added to
header (Stats) – The trace’s header
- pyglimer.database.raw.h5_tree(val, pre='')[source]#
Recursive function to print the structure of an ASDF dataset.
- pyglimer.database.raw.mseed_to_hdf5(rawfolder: str, save_statxml: bool, statloc: Optional[str] = None)[source]#
Convert a given mseed database in rawfolder to an h5 based database.
- Parameters:
rawfolder (os.PathLike) – directory in which the fodlers for each event are stored
save_statxml (bool) – should stationxmls be read as well? If so, you need to provide statloc as well.
statloc (str, optional) – location of station xmls only needed if save_statxml is is set to True, defaults to None
- pyglimer.database.raw.read_hdf5_header(dataset: Dataset) Stats [source]#
Takes an hdf5 dataset as input and returns the header of the Trace.
- Parameters:
dataset (h5py.Dataset) – The dataset to be read from
- Returns:
The trace’s header
- Return type:
Stats
- pyglimer.database.raw.save_raw_DB_single_station(network: str, station: str, saved: dict, st: Stream, rawloc: str, inv: Inventory)[source]#
A variation of the above function that will open the ASDF file once and write all traces and then close it afterwards Save the raw waveform data in the desired format. The point of this function is mainly that the waveforms will be saved with the correct associations and at the correct locations.
W are specifically writing event by event streams, so that we don’t loose parts that are already downloaded!
- Parameters:
saved (dict) – Dictionary holding information about the original streams to identify them afterwards.
st (Stream) – obspy stream holding all data (from various stations)
rawloc (str) – Parental directory (with phase) to save the files in.
inv (Inventory) – The inventory holding all the station information
- pyglimer.database.raw.statxml_to_hdf5(rawfolder: str, statloc: str)[source]#
Write a statxml database to h5 files
- Parameters:
rawfolder (stroros.PathLike) – location that contains the h5 files
statloc (stroros.PathLike) – location where the stationxmls are stored
- pyglimer.database.raw.write_st(st: Stream, event: Event, outfolder: str, statxml: Inventory, resample: bool = True)[source]#
Write raw waveform data to an asdf file. This includes the corresponding (teleseismic) event and the station inventory (i.e., response information).
- Parameters:
st (Stream) – The stream holding the raw waveform data.
event (Event) – The seismic event associated to the recorded data.
outfolder (str) – Output folder to write the asdf file to.
statxml (Inventory) – The station inventory
resample (bool, optional) – Resample the data to 10Hz sampling rate? Defaults to True.
- pyglimer.database.raw.write_st_to_ds(ds: DBHandler, st: Stream, evt_id: str, resample: bool = True)[source]#
Write raw waveform data to an asdf file. This includes the corresponding (teleseismic) event and the station inventory (i.e., response information).
- Parameters:
st (Stream) – The stream holding the raw waveform data.
event (Event) – The seismic event associated to the recorded data.
outfolder (str) – Output folder to write the asdf file to.
statxml (Inventory) – The station inventory
resample (bool, optional) – Resample the data to 10Hz sampling rate? Defaults to True.
pyglimer.database.rfh5#
- copyright:
The PyGLImER development team (makus@gfz-potsdam.de).
- license:
GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)
- author:
Peter Makus (makus@gfz-potsdam.de)
Created: Wednesday, 11th August 2021 03:20:09 pm Last Modified: Friday, 16th September 2022 02:47:40 pm
- class pyglimer.database.rfh5.DBHandler(path, mode, compression)[source]#
Bases:
File
The actual file handler of the hdf5 receiver function files.
Warning
Should not be accessed directly. Access :class:`~pyglimer.database.rfh5.RFDataBase` instead.
Child object of
h5py.File
and inherets all its attributes and functions in addition to functions that are particularly useful for receiver functions.Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.
- userblock_size
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
- rdcc_nbytes
Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1 MB) per dataset.
- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.
- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- fs_strategy
The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.
- fs_page_size
File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).
- fs_persist
A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.
- fs_threshold
The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.
- page_buf_size
Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.
- min_meta_keep
Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- min_raw_keep
Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- locking
The file locking behavior. Defined as: False (or “false”) Disable file locking True (or “true”) Enable file locking “best-effort” Enable file locking but ignore some errors None Use HDF5 defaults Warning: The HDF5_USE_FILE_LOCKING environment variable can override this parameter. Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.
- alignment_threshold
Together with
alignment_interval
, this property ensures that any file object greater than or equal in size to the alignement threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.- alignment_interval
This property should be used in conjunction with
alignment_threshold
. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT- Additional keywords
Passed on to the selected file driver.
- add_rf(data: RFTrace, tag: str = 'rf')[source]#
Add receiver function to the hdf5 file. The data can later be accessed using the
get_data()
method.
- get_coords(network: str, station: str, phase: Optional[str] = None, tag: str = 'rf') Tuple[float, float, float] [source]#
Return the coordinates of the station.
- Parameters:
network (str) – Network Code.
station (str) – Station Code
phase (str, optional) – Teleseismic Phase, defaults to None
- Returns:
Latitude (dec deg), Longitude (dec deg), Elevation (m)
- Return type:
Tuple[ float, float, float]
- get_data(network: str, station: str, phase: str, evt_time: UTCDateTime, tag: str = 'rf', pol: str = 'v') RFStream [source]#
Returns a
RFStream
holding all the requested data.Note
Wildcards are allowed for all parameters.
- Parameters:
network (str) – network code, e.g., IU
station (str) – station code, e.g., HRV
phase (str) – Teleseismic phase
evt_time (UTCDateTime, optional) – Origin Time of the Event
tag (str, optional) – Data tag (e.g., ‘rf’). Defaults to rf.
pol (str, optional) – RF Polarisation. Defaults to v.
- Returns:
a
RFStream
holding the requested data.- Return type:
- walk(tag: str, network: str, station: str, phase: str, pol: str = 'v') Iterable[RFTrace] [source]#
Iterate over all receiver functions with the given properties.
- Parameters:
tag (str) – data tag
network (str) – Network code
station (str) – Statio ncode
phase (str) – Teleseismic phase
pol (str, optional) – RF-Polarisation, defaults to ‘v’
- Returns:
Iterator
- Return type:
Iterable[RFTrace]
- Yield:
one
RFTrace
per receiver function.- Return type:
Iterator[Iterable[RFTrace]]
Note
Does not accept wildcards.
- class pyglimer.database.rfh5.RFDataBase(path: str, mode: str = 'a', compression: str = 'gzip3')[source]#
Bases:
object
Base class to handle the hdf5 files that contain receiver functions.
Access an hdf5 file holding receiver functions. The resulting file can be accessed using all functionalities of h5py (for example as a dict).
- Parameters:
path (str) – Full path to the file
mode (str, optional) – Mode to access the file. Options are: ‘a’ for all, ‘w’ for write, ‘r+’ for writing in an already existing file, or ‘r’ for read-only , defaults to ‘a’.
compression (str, optional) – The compression algorithm and compression level that the arrays should be saved with. ‘gzip3’ tends to perform well, else you could choose ‘gzipx’ where x is a digit between 1 and 9 (i.e., 9 is the highest compression) or None for fastest perfomance, defaults to ‘gzip3’.
Warning
Access only through a context manager (see below):
>>> with RFDataBase('myfile.h5') as rfdb: >>> type(rfdb) # This is a DBHandler <class 'pyglimer.database.rfh5.DBHandler'>
Example:
>>> with RFDataBase( '/path/to/db/XN.NEP06.h5') as rfdb: >>> # find the available tags for existing db >>> print(list(rfdb.keys())) ['rf', 'rfstack'] >>> # Get Data from all times and tag rf, phase P >>> st = rfdb.get_data( >>> 'XN', 'NEP06', 'P', '*', 'rf') >>> print(st.count()) 250
- pyglimer.database.rfh5.all_traces_recursive(group: Group, stream: RFStream, pattern: str) RFStream [source]#
Recursively, appends all traces in a h5py group to the input stream. In addition this will check whether the data matches a certain pattern.
- Parameters:
group (class:h5py._hl.group.Group) – group to search through
stream (Stream) – Stream to append the traces to
pattern (str) – pattern for the path in the hdf5 file, see fnmatch for details.
- Returns:
Stream with appended traces
- Return type:
Stream