Quickstart#

Welcome to refyre, an open source Python library dedicated to mass filesystem management / operations with a focus on advancing fields like artificial intelligence.
Refyre is an AI-fused Python package that provides two high level features:
Easy large scale filesystem manipulations
Efficient, code-less directory structuring and restructuring
Enhance your favorite Python packages such as Pandas, NumPy, Spark, and other data manipulation tools to quickly structure scattered data.
Features#
Filesystem agnostic data handshakes
Kickstart loading entire repositories & setting up virtual environments in a single command, your way
Perform mass operations on files such as copying, moving, zipping, POST-ing, in 1 line of code
Homebrew structured data such as Pandas DataFrames, and image datasets in a snap of your fingers (< 30 lines)
Refactor, organize, and analyze periodic research experiments with zero lines of code
Kickstart!#
Simply provide refyre with an “input specification”, telling it what directories to focus on
sample_input_spec.txt
'''
Suppose you have a directory structure
a/
a1.txt
a2.txt
...
b/
c/
c1.txt
c2.txt
d.txt
d2.txt
...
You seek to analyze the a files and the c files
'''
[dir="a"|name="a_var"]
[dir="b"]
[dir="c"|pattern="g!c?.txt"|name="c_var"] #Glob patterns start with 'g!', regex with 'r!', no need for just normal pattern matching
Have refyre analyze the directory with the following:
#Main analysis line
ref = Refyre(input_specs = ['sample_input_spec.txt'])
#Now, have a bit of fun!
a_var = ref["a_var"]
c_var = ref["c_var"]
print(len(a_var)) #Number of files
#Move all the files to another directory, copy works the same way
a_var = a_var.move('dir2') #.copy() ...
#Get all the files in a List[Pathlib.Path] objects
all_a_var = a_var.vals()
#Automatically zip a copy of all the files
zipped_c_var = c_var.zip()
print(len(zipped_c_var)) #1, the zipped c_var files
#Get all the parents dirs
c_var_parent_dirs = c_var.dirs()
print(type(c_var)) #refyre.cluster.FileCluster (this is what each variable type is)
#Do mass file management operations such as delete(), filter()
all_a_var_and_c_vars = FileCluster(values = []) #Values are strings of filepaths you want to do operations on
all_a_var_and_c_vars = a_var + c_var
filtered_c = all_a_var_and_c_vars.filter(lambda p : p.name.startswith('c'))
#Delete all files
filtered_c.delete()
#Automatically account for any modifications by variables
print(len(all_a_var_and_c_vars))
And finally, after any analysis, you can use the variables to generate specs
Let’s say you want to generate directories & data in the format specified by output_spec.txt:
'''
Sample output spec, creates
directories d & e, and ports the data
from a_var and c_var into it.
'''
[dir="d"|name="a_var"]
[dir="e"|name="c_var"]
One line.
ref.create_spec('output_spec.txt')
Alternatively, this entire process (minus the in-between analysis) can be done through our CLI.
refyre -i input.txt -o output.txt