.. _sec-make_inputs:

##################
Making Input Files
##################

Input files are required for creating a USGS slab model. The input file can be created using any of the 
existing databases within the repository (e.g., *0418database*, *0921database*) or, if new seismic 
data is being added, then a new database shouldbe created. If creating an input file from an existing 
database, then proceed to step 2 below. If adding new data, then start at step 1. See section 
:ref:`subsec-new_database` below for adding in new catalog data. The querying process can be 
found in the figure below.

.. _slab2_query_flowchart:
.. figure:: figures/slab2_query_flowchart.png

.. _subsec-new_database:
Creating a New Database 
=======================

Input files are created using existing databases provided in the USGS Slab Models repository. Existing databases can be found in 
*data/databases/*. To make an input file with new data, a new database will need to be created. 
To query new data from the Preliminary Determined Epicenters (PDE) or Global Centroid Moment Tensors 
(GCMT) catalogs, see the catalog query section below. 

Querying the PDE Catalog
------------------------

A method for automatically downloading data from the ComCat is provided in the 
*comcat_query.py* module. To use it, first navigate to *input/catalog_query/* and activate 
the Python virtual environment with the command: ``conda activate slab2env``. Next, run the command: 
``python comcat_query.py --intervalno [intervalno] --directory [directory]``. Where:

- ``--intervalno`` : 
    The query interval number. The COMCAT catalog will be split into 11 intervals if querying 
    the *entire* catalog. To speed up the query, run each interval in a separate terminal window. To query 
    only a specific range of the catalog, use any value greater than 11. Automated running of this script 
    uses a value of 100.

- ``--directory`` : 
    The directory where output files will be stored to. Automated running of this script uses 
    the name *pde_query*. 

Optional Arguments:

- ``--previous`` : 
    If querying only part of the catalog, use this argument to refer to the previous query,
    which all events since will be queried. 

- ``--starttime`` : 
    To specify a query start date, use this option instead of ``previous``. Accepts dates in 
    ISO format (*YYYY-MM-DD*).

- ``--finishtime`` : 
    Option to specify a finish date. Will default to current date if not used. Accepts dates in 
    ISO format (*YYYY-MM-DD*).

- ``--verbose`` : 
    Prints degugging notes if passed

Queried data will be saved to the specified ``directory`` in a file named: "*[directory]_[intervalno].csv*"

After a query is run, PDE data should be consolidated into a new file containg relevant information needed for generating a slab model. 
To do so, run ``python concat_comcat.py`` with the following options:

- ``--manyortwo`` : 
    Accepts values of *1* or *2*
    - *1* : Will merge all files in ``database`` and convert to the USGS slab model input format.
    - *2* : Will add recent query to an existing query

- ``--outfile`` : 
    Name of the output file to save to 

- ``--database`` : 
    If option 1 is selected, input the database with several PDE query files to be merged

- ``--oldfile`` : 
    If option 2 is selected, the name of the existing catalog

- ``--newfile`` : 
    If option 2 is selected, the name of the catalog to be added

- ``--newcat`` : 
    If option 2 is selected, the name of the new catalog to save results to

- ``--verbose`` : 
    Prints degugging notes if passed

Output file will be saved to ``outfile`` passed above and is ready to be associated with a relevant GCMT query. Note to get 
a complete catalog of earthquakes to use when making a new database, data from the last query will need to be added 
to the new query using option *2* for ``manyortwo``.

Querying the GCMT Catalog 
-------------------------
 
To query the `GCMT catalog <https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/NEW_MONTHLY/>`_, 
navigate to *input/catalog_query/* and activate the Python virtual environment using 
the command ``conda activate slab2env``. Next, run the module using:
``python make_columns.py --database [database] --oldcatalog [oldcatalog] --newcatalog [newcatalog]`` where: 

- ``--database`` : 
    Folder where unformatted gcmt files are stored

- ``--oldcatalog`` : 
    Path to the previous gcmt query file

- ``--newcatalog`` : 
    Destination for output file 

- ``--verbose`` : 
    Prints degugging notes if passed

Associating PDE and GCMT Data 
-----------------------------

To put the newly queried data into a database to be used for making a USGS slab model, the new PDE and GCMT data 
will need to be put in a single file. This step also sorts data by date and removes any duplicate events. To 
associate two queries, navigate to *input/catalog_query/* and activate the Python virtual environment using the command 
``conda activate slab2env``. Then run the module using: 

``python pde_gcmt_associate.py --gcmtfile [gcmtfile] --ccatfile [ccatfile] --asscfile [asscfile] --verbose`` where: 

- ``--gcmtfile`` : 
    GCMT file to be associated 

- ``--ccatfile`` : 
    PDE file to be associated 

- ``--asscfile`` : 
    Name of output associated data file 

- ``--verbose`` : 
    Prints debugging notes if passed

Creating a New Database 
-----------------------

The next step in migrating newly queried data into the proper format for making USGS slab models is to move the 
data into a new database. To do so, navigate to *input/catalog_query/* and activate the Python virtual environment 
using the command ``conda activate slab2env``. Then run the module using ``python make_database.py --eqfile [eqfile]``
where: 

- ``--eqfile`` : 
    Path to the associated PDE and GCMT data to be migrated into a database

A new database will be created in *data/databases/MMYYdatabase/* with the current month and year. The associated 
PDE and GCMT data file will be moved into this directory and renamed to *All_EQ_MMDDYY.csv*. Other types of input 
data will be copied from the most recent database into this new database. These supplementary input data files 
can be added and removed as needed, but must be in consistent format with other files of the same data type. Earthquake 
hypocenter data (from the *All_EQ_MMDDYY.csv* file) is required for making a slab model, however other data types are 
optional.

Making a New Input File 
=======================

Input files contain data organized by date and slab region, and are read when creating a new slab model. There are two 
options for creating new input files: 

Make All Inputs 
---------------

One option is to create input files for all known slab regions with default parameters at once. To do so, navigate to 
*input/create_input/* and activate the Python virtual environment with ``conda activate slab2env``. To run the module, 
use the command: ``python make_all_inputs.py --database [database]`` where: 

- ``--database`` : 
    Name of the database to use to make input files (*MMYYdatabase*)

New input files will be saved to *input/input_files/MM-YY/* with the month and year of the database used.

Make Individual Inputs 
----------------------

To create input files one at a time and apply custom filters, use this method. To do so, navigate to 
*input/create_input/* and activate the Python virtual environment with ``conda activate slab2env``. To run the 
module, use the command: ``python make_input.py --database [database] --slab_polygon [slab_polygon]`` where: 

- ``--database`` : 
    Name of the database to use to make input files (*MMYYdatabase*), may also be specified as:

    - ``--database_r`` : 
        Name of the database to use with regional preference

    - ``--database_g`` : 
        Name of the database to use with global preference 

- ``--slab_polygon`` : 
    Name of the slab region to use for constraining data boundaries, may instead specify: 

    - ``--bounds`` : 
        Manually specify the data boundaries with *lonmin lonmax latmin latmax* 
        
    - ``--seis`` : 
        Filter by seismogenic zone depth, single integer value

Optional arguments:

- ``--start-time`` : 
    Specifies the start time of data to use in ISO format (*YYYY-MM-DD*)

- ``--end-time`` : 
    Specifies the end time of data to use in ISO format (*YYYY-MM-DD*)

- ``--mag-range`` : 
    Minimum and maximum magnitudes to constrain earthquake data by (accepts 2 arguments)