reslib.data package¶
Subpackages¶
Submodules¶
reslib.data.cache module¶
reslib.data.cache¶
This module contains the DatasetCache object for reading/writing cached datasets to disk.
- copyright
2019 by Maclean Gaulin.
- license
MIT, see LICENSE for more details.
-
class
reslib.data.cache.
DataFrameCache
(override_filename=None, delete_cache=False)[source]¶ Bases:
object
Base class for caching intermediate files.
Defaults to reading/writing dataset cache with pandas to_csv. Default write args: sep=” “, index=False Default read args: sep=” “
Suggested subclassing:
class CompustatFUNDA(DataFrameCache): override_directory = '~/project/data/comp/' filename 'funda' def make_dataset(): # Download funda, return it as dataframe pass
-
property
data
¶ Property accessor for the underlying dataframe. Loads cached dataframe into memory, calling make_dataset() if no cache is available.
-
df
= None¶ DataFrame of the data
-
filename
= None¶ Override filename to name the dataset.
-
property
is_cached
¶ Boolean value for whether cached file exists at path.
-
override_directory
= None¶ Override directory to store the dataset in.
-
path
= None¶ Full path to the dataset.
-
read
(read_args=None)[source]¶ Read df from cache, returning ‘cleaned’ df.
Calls: _pre_read_hook() before, and _post_read_hook(read_df) after.
- Parameters
read_args (dict) – Dictionary of read-args to be passed to the read function, overriding those specified in self.read_args.
- Returns
- DataFrame which is passed through
_post_read_hook(df).
- Return type
pandas.DataFrame
-
read_args
= {'sep': '\t'}¶
-
write
(df, overwrite_cache=False, write_args=None)[source]¶ Write df to cache, returning ‘cleaned’ df.
- Parameters
df (pandas.DataFrame) – DataFrame to be written to disk, using the self.write_args and any override write_args if provided.
write_args (dict) – Dictionary of any write_args which will override self.write_args
-
write_args
= {'index': False, 'sep': '\t'}¶
-
property
-
class
reslib.data.cache.
ReadWriteArgCopyToDescendants
[source]¶ Bases:
type
Make read_args and write_args inheret from parent without super() init code. I know about dangerous mutable properties, but doubt it will apply much. This is about useage by research academics, not massively parallel projects. Citation: https://stackoverflow.com/a/42036304/1959876
Example:
class Gramma(metaclass=ReadWriteArgCopyToDescendants): read_args = {'sep': ' '} # Let's say we just want a read_args at first class Mom(Gramma): read_args = {'parse_dates': ['datadate', ]} pass assert Mom().read_args == {'sep': ' ', 'parse_dates': ['datadate', ]} assert Mom().write_args == {} class Kid(Mom): write_args = {'sep': ','} pass assert Kid().read_args == {'sep': ' ', 'parse_dates': ['datadate', ]} assert Kid().write_args == {'sep': ','}
reslib.data.dataframe module¶
reslib.data.dataframe¶
This module provides a wrapper around the pandas DataFrame class for some convenience functions (like stata-style column indexing, etc.).
- copyright
2019 by Maclean Gaulin.
- license
MIT, see LICENSE for more details.