mirror of
https://github.com/AllenDowney/AstronomicalData.git
synced 2025-12-07 13:20:46 -08:00
1641 lines
64 KiB
Plaintext
1641 lines
64 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Chapter 1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Astronomical Data in Python* is an introduction to tools and practices for working with astronomical data. Topics covered include:\n",
|
||
"\n",
|
||
"* Writing queries that select and download data from a database.\n",
|
||
"\n",
|
||
"* Using data stored in an Astropy `Table` or Pandas `DataFrame`.\n",
|
||
"\n",
|
||
"* Working with coordinates and other quantities with units.\n",
|
||
"\n",
|
||
"* Storing data in various formats.\n",
|
||
"\n",
|
||
"* Performing database join operations that combine data from multiple tables.\n",
|
||
"\n",
|
||
"* Visualizing data and preparing publication-quality figures."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As a running example, we will replicate part of the analysis in a recent paper, \"[Off the beaten path: Gaia reveals GD-1 stars outside of the main stream](https://arxiv.org/abs/1805.00425)\" by Adrian M. Price-Whelan and Ana Bonaca.\n",
|
||
"\n",
|
||
"As the abstract explains, \"Using data from the Gaia second data release combined with Pan-STARRS photometry, we present a sample of highly-probable members of the longest cold stream in the Milky Way, GD-1.\"\n",
|
||
"\n",
|
||
"GD-1 is a [stellar stream](https://en.wikipedia.org/wiki/List_of_stellar_streams), which is \"an association of stars orbiting a galaxy that was once a globular cluster or dwarf galaxy that has now been torn apart and stretched out along its orbit by tidal forces.\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[This article in *Science* magazine](https://www.sciencemag.org/news/2018/10/streams-stars-reveal-galaxy-s-violent-history-and-perhaps-its-unseen-dark-matter) explains some of the background, including the process that led to the paper and an discussion of the scientific implications:\n",
|
||
"\n",
|
||
"* \"The streams are particularly useful for ... galactic archaeology --- rewinding the cosmic clock to reconstruct the assembly of the Milky Way.\"\n",
|
||
"\n",
|
||
"* \"They also are being used as exquisitely sensitive scales to measure the galaxy's mass.\"\n",
|
||
"\n",
|
||
"* \"... the streams are well-positioned to reveal the presence of dark matter ... because the streams are so fragile, theorists say, collisions with marauding clumps of dark matter could leave telltale scars, potential clues to its nature.\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Data\n",
|
||
"\n",
|
||
"The datasets we will work with are:\n",
|
||
" \n",
|
||
"* [Gaia](https://en.wikipedia.org/wiki/Gaia_(spacecraft)), which is \"a space observatory of the European Space Agency (ESA), launched in 2013 ... designed for astrometry: measuring the positions, distances and motions of stars with unprecedented precision\", and\n",
|
||
"\n",
|
||
"* [Pan-STARRS](https://en.wikipedia.org/wiki/Pan-STARRS), The Panoramic Survey Telescope and Rapid Response System, which is a survey designed to monitor the sky for transient objects, producing a catalog with accurate astronometry and photometry of detected sources.\n",
|
||
"\n",
|
||
"Both of these datasets are very large, which can make them challenging to work with. It might not be possible, or practical, to download the entire dataset.\n",
|
||
"One of the goals of this workshop is to provide tools for working with large datasets."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Prerequisites\n",
|
||
"\n",
|
||
"These notebooks are meant for people who are familiar with basic Python, but not necessarily the libraries we will use, like Astropy or Pandas. If you are familiar with Python lists and dictionaries, and you know how to write a function that takes parameters and returns a value, you know enough Python to get started.\n",
|
||
"\n",
|
||
"We assume that you have some familiarity with operating systems, like the ability to use a command-line interface. But we don't assume you have any prior experience with databases.\n",
|
||
"\n",
|
||
"We assume that you are familiar with astronomy at the undergraduate level, but we will not assume specialized knowledge of the datasets or analysis methods we'll use. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Outline\n",
|
||
"\n",
|
||
"The first lesson demonstrates the steps for selecting and downloading data from the Gaia Database:\n",
|
||
"\n",
|
||
"1. First we'll make a connection to the Gaia server,\n",
|
||
"\n",
|
||
"2. We will explore information about the database and the tables it contains,\n",
|
||
"\n",
|
||
"3. We will write a query and send it to the server, and finally\n",
|
||
"\n",
|
||
"4. We will download the response from the server.\n",
|
||
"\n",
|
||
"After completing this lesson, you should be able to\n",
|
||
"\n",
|
||
"* Compose a basic query in ADQL.\n",
|
||
"\n",
|
||
"* Use queries to explore a database and its tables.\n",
|
||
"\n",
|
||
"* Use queries to download data.\n",
|
||
"\n",
|
||
"* Develop, test, and debug a query incrementally."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Query Language\n",
|
||
"\n",
|
||
"In order to select data from a database, you have to compose a query, which is like a program written in a \"query language\".\n",
|
||
"The query language we'll use is ADQL, which stands for \"Astronomical Data Query Language\".\n",
|
||
"\n",
|
||
"ADQL is a dialect of [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language), which is by far the most commonly used query language. Almost everything you will learn about ADQL also works in SQL.\n",
|
||
"\n",
|
||
"[The reference manual for ADQL is here](http://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html).\n",
|
||
"But you might find it easier to learn from [this ADQL Cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Installing libraries\n",
|
||
"\n",
|
||
"The library we'll use to get Gaia data is [Astroquery](https://astroquery.readthedocs.io/en/latest/).\n",
|
||
"\n",
|
||
"If you are running this notebook on Colab, you can run the following cell to install Astroquery and the other libraries we'll use.\n",
|
||
"\n",
|
||
"If you are running this notebook on your own computer, you might have to install these libraries yourself. \n",
|
||
"\n",
|
||
"If you are using this notebook as part of a Carpentries workshop, you should have received setup instructions.\n",
|
||
"\n",
|
||
"TODO: Add a link to the instructions.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# If we're running on Colab, install libraries\n",
|
||
"\n",
|
||
"import sys\n",
|
||
"IN_COLAB = 'google.colab' in sys.modules\n",
|
||
"\n",
|
||
"if IN_COLAB:\n",
|
||
" !pip install astroquery astro-gala pyia"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Connecting to Gaia\n",
|
||
"\n",
|
||
"Astroquery provides `Gaia`, which is an [object that represents a connection to the Gaia database](https://astroquery.readthedocs.io/en/latest/gaia/gaia.html).\n",
|
||
"\n",
|
||
"We can connect to the Gaia database like this:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Created TAP+ (v1.2.1) - Connection:\n",
|
||
"\tHost: gea.esac.esa.int\n",
|
||
"\tUse HTTPS: True\n",
|
||
"\tPort: 443\n",
|
||
"\tSSL Port: 443\n",
|
||
"Created TAP+ (v1.2.1) - Connection:\n",
|
||
"\tHost: geadata.esac.esa.int\n",
|
||
"\tUse HTTPS: True\n",
|
||
"\tPort: 443\n",
|
||
"\tSSL Port: 443\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from astroquery.gaia import Gaia"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional detail**\n",
|
||
"\n",
|
||
"> Running this import statement has the effect of creating a [TAP+](http://www.ivoa.net/documents/TAP/) connection; TAP stands for \"Table Access Protocol\". It is a network protocol for sending queries to the database and getting back the results. We're not sure why it seems to create two connections."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Databases and Tables\n",
|
||
"\n",
|
||
"What is a database, anyway? Most generally, it can be any collection of data, but when we are talking about ADQL or SQL:\n",
|
||
"\n",
|
||
"* A database is a collection of one or more named tables.\n",
|
||
"\n",
|
||
"* Each table is a 2-D array with one or more named columns of data.\n",
|
||
"\n",
|
||
"We can use `Gaia.load_tables` to get the names of the tables in the Gaia database. With the option `only_names=True`, it loads information about the tables, called the \"metadata\", not the data itself."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO: Retrieving tables... [astroquery.utils.tap.core]\n",
|
||
"INFO: Parsing tables... [astroquery.utils.tap.core]\n",
|
||
"INFO: Done. [astroquery.utils.tap.core]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"tables = Gaia.load_tables(only_names=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"external.external.apassdr9\n",
|
||
"external.external.gaiadr2_geometric_distance\n",
|
||
"external.external.galex_ais\n",
|
||
"external.external.ravedr5_com\n",
|
||
"external.external.ravedr5_dr5\n",
|
||
"external.external.ravedr5_gra\n",
|
||
"external.external.ravedr5_on\n",
|
||
"external.external.sdssdr13_photoprimary\n",
|
||
"external.external.skymapperdr1_master\n",
|
||
"external.external.tmass_xsc\n",
|
||
"public.public.hipparcos\n",
|
||
"public.public.hipparcos_newreduction\n",
|
||
"public.public.hubble_sc\n",
|
||
"public.public.igsl_source\n",
|
||
"public.public.igsl_source_catalog_ids\n",
|
||
"public.public.tycho2\n",
|
||
"public.public.dual\n",
|
||
"tap_config.tap_config.coord_sys\n",
|
||
"tap_config.tap_config.properties\n",
|
||
"tap_schema.tap_schema.columns\n",
|
||
"tap_schema.tap_schema.key_columns\n",
|
||
"tap_schema.tap_schema.keys\n",
|
||
"tap_schema.tap_schema.schemas\n",
|
||
"tap_schema.tap_schema.tables\n",
|
||
"gaiadr1.gaiadr1.aux_qso_icrf2_match\n",
|
||
"gaiadr1.gaiadr1.ext_phot_zero_point\n",
|
||
"gaiadr1.gaiadr1.allwise_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.allwise_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.gsc23_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.gsc23_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.ppmxl_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.ppmxl_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.sdss_dr9_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.sdss_dr9_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.tmass_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.tmass_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.ucac4_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.ucac4_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.urat1_best_neighbour\n",
|
||
"gaiadr1.gaiadr1.urat1_neighbourhood\n",
|
||
"gaiadr1.gaiadr1.cepheid\n",
|
||
"gaiadr1.gaiadr1.phot_variable_time_series_gfov\n",
|
||
"gaiadr1.gaiadr1.phot_variable_time_series_gfov_statistical_parameters\n",
|
||
"gaiadr1.gaiadr1.rrlyrae\n",
|
||
"gaiadr1.gaiadr1.variable_summary\n",
|
||
"gaiadr1.gaiadr1.allwise_original_valid\n",
|
||
"gaiadr1.gaiadr1.gsc23_original_valid\n",
|
||
"gaiadr1.gaiadr1.ppmxl_original_valid\n",
|
||
"gaiadr1.gaiadr1.sdssdr9_original_valid\n",
|
||
"gaiadr1.gaiadr1.tmass_original_valid\n",
|
||
"gaiadr1.gaiadr1.ucac4_original_valid\n",
|
||
"gaiadr1.gaiadr1.urat1_original_valid\n",
|
||
"gaiadr1.gaiadr1.gaia_source\n",
|
||
"gaiadr1.gaiadr1.tgas_source\n",
|
||
"gaiadr2.gaiadr2.aux_allwise_agn_gdr2_cross_id\n",
|
||
"gaiadr2.gaiadr2.aux_iers_gdr2_cross_id\n",
|
||
"gaiadr2.gaiadr2.aux_sso_orbit_residuals\n",
|
||
"gaiadr2.gaiadr2.aux_sso_orbits\n",
|
||
"gaiadr2.gaiadr2.dr1_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.allwise_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.allwise_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.apassdr9_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.apassdr9_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.gsc23_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.gsc23_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.hipparcos2_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.hipparcos2_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.panstarrs1_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.panstarrs1_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.ppmxl_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.ppmxl_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.ravedr5_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.ravedr5_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.sdssdr9_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.sdssdr9_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.tmass_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.tmass_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.tycho2_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.tycho2_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.urat1_best_neighbour\n",
|
||
"gaiadr2.gaiadr2.urat1_neighbourhood\n",
|
||
"gaiadr2.gaiadr2.sso_observation\n",
|
||
"gaiadr2.gaiadr2.sso_source\n",
|
||
"gaiadr2.gaiadr2.vari_cepheid\n",
|
||
"gaiadr2.gaiadr2.vari_classifier_class_definition\n",
|
||
"gaiadr2.gaiadr2.vari_classifier_definition\n",
|
||
"gaiadr2.gaiadr2.vari_classifier_result\n",
|
||
"gaiadr2.gaiadr2.vari_long_period_variable\n",
|
||
"gaiadr2.gaiadr2.vari_rotation_modulation\n",
|
||
"gaiadr2.gaiadr2.vari_rrlyrae\n",
|
||
"gaiadr2.gaiadr2.vari_short_timescale\n",
|
||
"gaiadr2.gaiadr2.vari_time_series_statistics\n",
|
||
"gaiadr2.gaiadr2.panstarrs1_original_valid\n",
|
||
"gaiadr2.gaiadr2.gaia_source\n",
|
||
"gaiadr2.gaiadr2.ruwe\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for table in (tables):\n",
|
||
" print(table.get_qualified_name())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"So that's a lot of tables. The ones we'll use are:\n",
|
||
"\n",
|
||
"* `gaiadr2.gaia_source`, which contains Gaia data from [data release 2](https://www.cosmos.esa.int/web/gaia/data-release-2),\n",
|
||
"\n",
|
||
"* `gaiadr2.panstarrs1_original_valid`, which contains the photometry data we'll use from PanSTARRS, and\n",
|
||
"\n",
|
||
"* `gaiadr2.panstarrs1_best_neighbour`, which we'll use to cross-match each star observed by Gaia with the same star observed by PanSTARRS.\n",
|
||
"\n",
|
||
"We can use `load_table` (not `load_tables`) to get the metadata for a single table. The name of this function is misleading, because it only downloads metadata. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Retrieving table 'gaiadr2.gaia_source'\n",
|
||
"Parsing table 'gaiadr2.gaia_source'...\n",
|
||
"Done.\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<astroquery.utils.tap.model.taptable.TapTableMeta at 0x7f922376e0a0>"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"meta = Gaia.load_table('gaiadr2.gaia_source')\n",
|
||
"meta"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Jupyter shows that the result is an object of type `TapTableMeta`, but it does not display the contents.\n",
|
||
"\n",
|
||
"To see the metadata, we have to print the object."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"TAP Table name: gaiadr2.gaiadr2.gaia_source\n",
|
||
"Description: This table has an entry for every Gaia observed source as listed in the\n",
|
||
"Main Database accumulating catalogue version from which the catalogue\n",
|
||
"release has been generated. It contains the basic source parameters,\n",
|
||
"that is only final data (no epoch data) and no spectra (neither final\n",
|
||
"nor epoch).\n",
|
||
"Num. columns: 96\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(meta)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Notice one gotcha: in the list of table names, this table appears as `gaiadr2.gaiadr2.gaia_source`, but when we load the metadata, we refer to it as `gaiadr2.gaia_source`.\n",
|
||
"\n",
|
||
"**Exercise:** Go back and try\n",
|
||
"\n",
|
||
"```\n",
|
||
"meta = Gaia.load_table('gaiadr2.gaiadr2.gaia_source')\n",
|
||
"```\n",
|
||
"\n",
|
||
"What happens? Is the error message helpful? If you had not made this error deliberately, would you have been able to figure it out?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Columns\n",
|
||
"\n",
|
||
"The following loop prints the names of the columns in the table."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"solution_id\n",
|
||
"designation\n",
|
||
"source_id\n",
|
||
"random_index\n",
|
||
"ref_epoch\n",
|
||
"ra\n",
|
||
"ra_error\n",
|
||
"dec\n",
|
||
"dec_error\n",
|
||
"parallax\n",
|
||
"parallax_error\n",
|
||
"parallax_over_error\n",
|
||
"pmra\n",
|
||
"pmra_error\n",
|
||
"pmdec\n",
|
||
"pmdec_error\n",
|
||
"ra_dec_corr\n",
|
||
"ra_parallax_corr\n",
|
||
"ra_pmra_corr\n",
|
||
"ra_pmdec_corr\n",
|
||
"dec_parallax_corr\n",
|
||
"dec_pmra_corr\n",
|
||
"dec_pmdec_corr\n",
|
||
"parallax_pmra_corr\n",
|
||
"parallax_pmdec_corr\n",
|
||
"pmra_pmdec_corr\n",
|
||
"astrometric_n_obs_al\n",
|
||
"astrometric_n_obs_ac\n",
|
||
"astrometric_n_good_obs_al\n",
|
||
"astrometric_n_bad_obs_al\n",
|
||
"astrometric_gof_al\n",
|
||
"astrometric_chi2_al\n",
|
||
"astrometric_excess_noise\n",
|
||
"astrometric_excess_noise_sig\n",
|
||
"astrometric_params_solved\n",
|
||
"astrometric_primary_flag\n",
|
||
"astrometric_weight_al\n",
|
||
"astrometric_pseudo_colour\n",
|
||
"astrometric_pseudo_colour_error\n",
|
||
"mean_varpi_factor_al\n",
|
||
"astrometric_matched_observations\n",
|
||
"visibility_periods_used\n",
|
||
"astrometric_sigma5d_max\n",
|
||
"frame_rotator_object_type\n",
|
||
"matched_observations\n",
|
||
"duplicated_source\n",
|
||
"phot_g_n_obs\n",
|
||
"phot_g_mean_flux\n",
|
||
"phot_g_mean_flux_error\n",
|
||
"phot_g_mean_flux_over_error\n",
|
||
"phot_g_mean_mag\n",
|
||
"phot_bp_n_obs\n",
|
||
"phot_bp_mean_flux\n",
|
||
"phot_bp_mean_flux_error\n",
|
||
"phot_bp_mean_flux_over_error\n",
|
||
"phot_bp_mean_mag\n",
|
||
"phot_rp_n_obs\n",
|
||
"phot_rp_mean_flux\n",
|
||
"phot_rp_mean_flux_error\n",
|
||
"phot_rp_mean_flux_over_error\n",
|
||
"phot_rp_mean_mag\n",
|
||
"phot_bp_rp_excess_factor\n",
|
||
"phot_proc_mode\n",
|
||
"bp_rp\n",
|
||
"bp_g\n",
|
||
"g_rp\n",
|
||
"radial_velocity\n",
|
||
"radial_velocity_error\n",
|
||
"rv_nb_transits\n",
|
||
"rv_template_teff\n",
|
||
"rv_template_logg\n",
|
||
"rv_template_fe_h\n",
|
||
"phot_variable_flag\n",
|
||
"l\n",
|
||
"b\n",
|
||
"ecl_lon\n",
|
||
"ecl_lat\n",
|
||
"priam_flags\n",
|
||
"teff_val\n",
|
||
"teff_percentile_lower\n",
|
||
"teff_percentile_upper\n",
|
||
"a_g_val\n",
|
||
"a_g_percentile_lower\n",
|
||
"a_g_percentile_upper\n",
|
||
"e_bp_min_rp_val\n",
|
||
"e_bp_min_rp_percentile_lower\n",
|
||
"e_bp_min_rp_percentile_upper\n",
|
||
"flame_flags\n",
|
||
"radius_val\n",
|
||
"radius_percentile_lower\n",
|
||
"radius_percentile_upper\n",
|
||
"lum_val\n",
|
||
"lum_percentile_lower\n",
|
||
"lum_percentile_upper\n",
|
||
"datalink_url\n",
|
||
"epoch_photometry_url\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for column in meta.columns:\n",
|
||
" print(column.name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can probably guess what many of these columns are by looking at the names, but you should resist the temptation to guess.\n",
|
||
"To find out what the columns mean, [read the documentation](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html).\n",
|
||
"\n",
|
||
"If you want to know what can go wrong when you don't read the documentation, [you might like this article](https://www.vox.com/future-perfect/2019/6/4/18650969/married-women-miserable-fake-paul-dolan-happiness)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise:** One of the other tables we'll use is `gaiadr2.gaiadr2.panstarrs1_original_valid`. Use `load_table` to get the metadata for this table. How many columns are there and what are their names?\n",
|
||
"\n",
|
||
"Hint: Remember the gotcha we mentioned earlier."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Retrieving table 'gaiadr2.panstarrs1_original_valid'\n",
|
||
"Parsing table 'gaiadr2.panstarrs1_original_valid'...\n",
|
||
"Done.\n",
|
||
"TAP Table name: gaiadr2.gaiadr2.panstarrs1_original_valid\n",
|
||
"Description: The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is\n",
|
||
"a system for wide-field astronomical imaging developed and operated by\n",
|
||
"the Institute for Astronomy at the University of Hawaii. Pan-STARRS1\n",
|
||
"(PS1) is the first part of Pan-STARRS to be completed and is the basis\n",
|
||
"for Data Release 1 (DR1). The PS1 survey used a 1.8 meter telescope and\n",
|
||
"its 1.4 Gigapixel camera to image the sky in five broadband filters (g,\n",
|
||
"r, i, z, y).\n",
|
||
"\n",
|
||
"The current table contains a filtered subsample of the 10 723 304 629\n",
|
||
"entries listed in the original ObjectThin table.\n",
|
||
"We used only ObjectThin and MeanObject tables to extract\n",
|
||
"panstarrs1OriginalValid table, this means that objects detected only in\n",
|
||
"stack images are not included here. The main reason for us to avoid the\n",
|
||
"use of objects detected in stack images is that their astrometry is not\n",
|
||
"as good as the mean objects astrometry: “The stack positions (raStack,\n",
|
||
"decStack) have considerably larger systematic astrometric errors than\n",
|
||
"the mean epoch positions (raMean, decMean).” The astrometry for the\n",
|
||
"MeanObject positions uses Gaia DR1 as a reference catalog, while the\n",
|
||
"stack positions use 2MASS as a reference catalog.\n",
|
||
"\n",
|
||
"In details, we filtered out all objects where:\n",
|
||
"\n",
|
||
"- nDetections = 1\n",
|
||
"\n",
|
||
"- no good quality data in Pan-STARRS, objInfoFlag 33554432 not set\n",
|
||
"\n",
|
||
"- mean astrometry could not be measured, objInfoFlag 524288 set\n",
|
||
"\n",
|
||
"- stack position used for mean astrometry, objInfoFlag 1048576 set\n",
|
||
"\n",
|
||
"- error on all magnitudes equal to 0 or to -999;\n",
|
||
"\n",
|
||
"- all magnitudes set to -999;\n",
|
||
"\n",
|
||
"- error on RA or DEC greater than 1 arcsec.\n",
|
||
"\n",
|
||
"The number of objects in panstarrs1OriginalValid is 2 264 263 282.\n",
|
||
"\n",
|
||
"The panstarrs1OriginalValid table contains only a subset of the columns\n",
|
||
"available in the combined ObjectThin and MeanObject tables. A\n",
|
||
"description of the original ObjectThin and MeanObjects tables can be\n",
|
||
"found at:\n",
|
||
"https://outerspace.stsci.edu/display/PANSTARRS/PS1+Database+object+and+detection+tables\n",
|
||
"\n",
|
||
"Download:\n",
|
||
"http://mastweb.stsci.edu/ps1casjobs/home.aspx\n",
|
||
"Documentation:\n",
|
||
"https://outerspace.stsci.edu/display/PANSTARRS\n",
|
||
"http://pswww.ifa.hawaii.edu/pswww/\n",
|
||
"References:\n",
|
||
"The Pan-STARRS1 Surveys, Chambers, K.C., et al. 2016, arXiv:1612.05560\n",
|
||
"Pan-STARRS Data Processing System, Magnier, E. A., et al. 2016,\n",
|
||
"arXiv:1612.05240\n",
|
||
"Pan-STARRS Pixel Processing: Detrending, Warping, Stacking, Waters, C.\n",
|
||
"Z., et al. 2016, arXiv:1612.05245\n",
|
||
"Pan-STARRS Pixel Analysis: Source Detection and Characterization,\n",
|
||
"Magnier, E. A., et al. 2016, arXiv:1612.05244\n",
|
||
"Pan-STARRS Photometric and Astrometric Calibration, Magnier, E. A., et\n",
|
||
"al. 2016, arXiv:1612.05242\n",
|
||
"The Pan-STARRS1 Database and Data Products, Flewelling, H. A., et al.\n",
|
||
"2016, arXiv:1612.05243\n",
|
||
"\n",
|
||
"Catalogue curator:\n",
|
||
"SSDC - ASI Space Science Data Center\n",
|
||
"https://www.ssdc.asi.it/\n",
|
||
"Num. columns: 26\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"meta2 = Gaia.load_table('gaiadr2.panstarrs1_original_valid')\n",
|
||
"print(meta2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"obj_name\n",
|
||
"obj_id\n",
|
||
"ra\n",
|
||
"dec\n",
|
||
"ra_error\n",
|
||
"dec_error\n",
|
||
"epoch_mean\n",
|
||
"g_mean_psf_mag\n",
|
||
"g_mean_psf_mag_error\n",
|
||
"g_flags\n",
|
||
"r_mean_psf_mag\n",
|
||
"r_mean_psf_mag_error\n",
|
||
"r_flags\n",
|
||
"i_mean_psf_mag\n",
|
||
"i_mean_psf_mag_error\n",
|
||
"i_flags\n",
|
||
"z_mean_psf_mag\n",
|
||
"z_mean_psf_mag_error\n",
|
||
"z_flags\n",
|
||
"y_mean_psf_mag\n",
|
||
"y_mean_psf_mag_error\n",
|
||
"y_flags\n",
|
||
"n_detections\n",
|
||
"zone_id\n",
|
||
"obj_info_flag\n",
|
||
"quality_flag\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"for column in meta2.columns:\n",
|
||
" print(column.name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Writing queries\n",
|
||
"\n",
|
||
"By now you might be wondering how we actually download the data. With tables this big, you generally don't. Instead, you use queries to select only the data you want.\n",
|
||
"\n",
|
||
"A query is a string written in a query language like SQL; for the Gaia database, the query language is a dialect of SQL called ADQL.\n",
|
||
"\n",
|
||
"Here's an example of an ADQL query."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query1 = \"\"\"SELECT \n",
|
||
"TOP 10\n",
|
||
"source_id, ref_epoch, ra, dec, parallax \n",
|
||
"FROM gaiadr2.gaia_source\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Python note:** We use a [triple-quoted string](https://docs.python.org/3/tutorial/introduction.html#strings) here so we can include line breaks in the query, which makes it easier to read.\n",
|
||
"\n",
|
||
"The words in uppercase are ADQL keywords:\n",
|
||
"\n",
|
||
"* `SELECT` indicates that we are selecting data (as opposed to adding or modifying data).\n",
|
||
"\n",
|
||
"* `TOP` indicates that we only want the first 10 rows of the table, which is useful for testing a query before asking for all of the data.\n",
|
||
"\n",
|
||
"* `FROM` specifies which table we want data from.\n",
|
||
"\n",
|
||
"The third line is a list of column names, indicating which columns we want. \n",
|
||
"\n",
|
||
"In this example, the keywords are capitalized and the column names are lowercase. This is a common style, but it is not required. ADQL and SQL are not case-sensitive."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To run this query, we use the `Gaia` object, which represents our connection to the Gaia database, and invoke `launch_job`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<astroquery.utils.tap.model.job.Job at 0x7f9222e9cb20>"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"job1 = Gaia.launch_job(query1)\n",
|
||
"job1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The result is an object that represents the job running on a Gaia server.\n",
|
||
"\n",
|
||
"If you print it, it displays metadata for the forthcoming table."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<Table length=10>\n",
|
||
" name dtype unit description \n",
|
||
"--------- ------- ---- ------------------------------------------------------------------\n",
|
||
"source_id int64 Unique source identifier (unique within a particular Data Release)\n",
|
||
"ref_epoch float64 yr Reference epoch\n",
|
||
" ra float64 deg Right ascension\n",
|
||
" dec float64 deg Declination\n",
|
||
" parallax float64 mas Parallax\n",
|
||
"Jobid: None\n",
|
||
"Phase: COMPLETED\n",
|
||
"Owner: None\n",
|
||
"Output file: sync_20201005090721.xml.gz\n",
|
||
"Results: None\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(job1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Don't worry about `Results: None`. That does not actually mean there are no results.\n",
|
||
"\n",
|
||
"However, `Phase: COMPLETED` indicates that the job is complete, so we can get the results like this:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"astropy.table.table.Table"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"results1 = job1.get_results()\n",
|
||
"type(results1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional detail:** Why is `table` repeated three times? The first is the name of the module, the second is the name of the submodule, and the third is the name of the class. Most of the time we only care about the last one. It's like the Linnean name for gorilla, which is *Gorilla Gorilla Gorilla*."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The result is an [Astropy Table](https://docs.astropy.org/en/stable/table/), which is similar to a table in an SQL database except:\n",
|
||
"\n",
|
||
"* SQL databases are stored on disk drives, so they are persistent; that is, they \"survive\" even if you turn off the computer. An Astropy `Table` is stored in memory; it disappears when you turn off the computer (or shut down this Jupyter notebook).\n",
|
||
"\n",
|
||
"* SQL databases are designed to process queries. An Astropy `Table` can perform some query-like operations, like selecting columns and rows. But these operations use Python syntax, not SQL.\n",
|
||
"\n",
|
||
"Jupyter knows how to display the contents of a `Table`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<i>Table length=10</i>\n",
|
||
"<table id=\"table140265627585264\" class=\"table-striped table-bordered table-condensed\">\n",
|
||
"<thead><tr><th>source_id</th><th>ref_epoch</th><th>ra</th><th>dec</th><th>parallax</th></tr></thead>\n",
|
||
"<thead><tr><th></th><th>yr</th><th>deg</th><th>deg</th><th>mas</th></tr></thead>\n",
|
||
"<thead><tr><th>int64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th></tr></thead>\n",
|
||
"<tr><td>4530738361793769600</td><td>2015.5</td><td>281.56725362448725</td><td>20.40682117430378</td><td>0.9785380604519425</td></tr>\n",
|
||
"<tr><td>4530752651135081216</td><td>2015.5</td><td>281.0861565355257</td><td>20.523350496351846</td><td>0.2674800612552977</td></tr>\n",
|
||
"<tr><td>4530743343951405568</td><td>2015.5</td><td>281.37114418299177</td><td>20.474147574053124</td><td>-0.43911323550176806</td></tr>\n",
|
||
"<tr><td>4530755060627162368</td><td>2015.5</td><td>281.2676236268299</td><td>20.558523922346158</td><td>1.1422630184554958</td></tr>\n",
|
||
"<tr><td>4530746844341315968</td><td>2015.5</td><td>281.1370431749541</td><td>20.377852388898184</td><td>1.0092247424630945</td></tr>\n",
|
||
"<tr><td>4530768456615026432</td><td>2015.5</td><td>281.8720921436347</td><td>20.31829694530366</td><td>-0.06900136127674149</td></tr>\n",
|
||
"<tr><td>4530763513119137280</td><td>2015.5</td><td>281.9211808864116</td><td>20.20956829578524</td><td>0.1266016679823622</td></tr>\n",
|
||
"<tr><td>4530736364618539264</td><td>2015.5</td><td>281.4913475613274</td><td>20.346579041327693</td><td>0.3894019486060072</td></tr>\n",
|
||
"<tr><td>4530735952305177728</td><td>2015.5</td><td>281.4085549165704</td><td>20.311030903719928</td><td>0.2041189982608354</td></tr>\n",
|
||
"<tr><td>4530751281056022656</td><td>2015.5</td><td>281.0585328377638</td><td>20.460309556214753</td><td>0.10294642821734962</td></tr>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<Table length=10>\n",
|
||
" source_id ref_epoch ... dec parallax \n",
|
||
" yr ... deg mas \n",
|
||
" int64 float64 ... float64 float64 \n",
|
||
"------------------- --------- ... ------------------ --------------------\n",
|
||
"4530738361793769600 2015.5 ... 20.40682117430378 0.9785380604519425\n",
|
||
"4530752651135081216 2015.5 ... 20.523350496351846 0.2674800612552977\n",
|
||
"4530743343951405568 2015.5 ... 20.474147574053124 -0.43911323550176806\n",
|
||
"4530755060627162368 2015.5 ... 20.558523922346158 1.1422630184554958\n",
|
||
"4530746844341315968 2015.5 ... 20.377852388898184 1.0092247424630945\n",
|
||
"4530768456615026432 2015.5 ... 20.31829694530366 -0.06900136127674149\n",
|
||
"4530763513119137280 2015.5 ... 20.20956829578524 0.1266016679823622\n",
|
||
"4530736364618539264 2015.5 ... 20.346579041327693 0.3894019486060072\n",
|
||
"4530735952305177728 2015.5 ... 20.311030903719928 0.2041189982608354\n",
|
||
"4530751281056022656 2015.5 ... 20.460309556214753 0.10294642821734962"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"results1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Each column has a name, units, and a data type.\n",
|
||
"\n",
|
||
"For example, the units of `ra` and `dec` are degrees, and their data type is `float64`, which is a 64-bit floating-point number, used to store measurements with a fraction part.\n",
|
||
"\n",
|
||
"This information comes from the Gaia database, and has been stored in the Astropy `Table` by Astroquery."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise:** Read [the documentation of this table](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html) and choose a column that looks interesting to you. Add the column name to the query and run it again. What are the units of the column you selected? What is its data type?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Asynchronous queries\n",
|
||
"\n",
|
||
"`launch_job` asks the server to run the job \"synchronously\", which normally means it runs immediately. But synchronous jobs are limited to 2000 rows. For queries that return more rows, you should run \"asynchronously\", which mean they might take longer to get started.\n",
|
||
"\n",
|
||
"If you are not sure how many rows a query will return, you can use the SQL command `COUNT` to find out how many rows are in the result without actually returning them. We'll see an example of this later.\n",
|
||
"\n",
|
||
"The results of an asynchronous query are stored in a file on the server, so you can start a query and come back later to get the results.\n",
|
||
"\n",
|
||
"For anonymous users, files are kept for three days.\n",
|
||
"\n",
|
||
"As an example, let's try a query that's similar to `query1`, with two changes:\n",
|
||
"\n",
|
||
"* It selects the first 3000 rows, so it is bigger than we should run synchronously.\n",
|
||
"\n",
|
||
"* It uses a new keyword, `WHERE`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query2 = \"\"\"SELECT TOP 3000\n",
|
||
"source_id, ref_epoch, ra, dec, parallax\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 1\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"A `WHERE` clause indicates which rows we want; in this case, the query selects only rows \"where\" `parallax` is less than 1. This has the effect of selecting stars with relatively low parallax, which are farther away. We'll use this clause to exclude nearby stars that are unlikely to be part of GD-1.\n",
|
||
"\n",
|
||
"`WHERE` is one of the most common clauses in ADQL/SQL, and one of the most useful, because it allows us to select only the rows we need from the database.\n",
|
||
"\n",
|
||
"We use `launch_job_async` to submit an asynchronous query."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO: Query finished. [astroquery.utils.tap.core]\n",
|
||
"<Table length=3000>\n",
|
||
" name dtype unit description \n",
|
||
"--------- ------- ---- ------------------------------------------------------------------\n",
|
||
"source_id int64 Unique source identifier (unique within a particular Data Release)\n",
|
||
"ref_epoch float64 yr Reference epoch\n",
|
||
" ra float64 deg Right ascension\n",
|
||
" dec float64 deg Declination\n",
|
||
" parallax float64 mas Parallax\n",
|
||
"Jobid: 1601903242219O\n",
|
||
"Phase: COMPLETED\n",
|
||
"Owner: None\n",
|
||
"Output file: async_20201005090722.vot\n",
|
||
"Results: None\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"job2 = Gaia.launch_job_async(query2)\n",
|
||
"print(job2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And here are the results."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<i>Table length=3000</i>\n",
|
||
"<table id=\"table140265625141056\" class=\"table-striped table-bordered table-condensed\">\n",
|
||
"<thead><tr><th>source_id</th><th>ref_epoch</th><th>ra</th><th>dec</th><th>parallax</th></tr></thead>\n",
|
||
"<thead><tr><th></th><th>yr</th><th>deg</th><th>deg</th><th>mas</th></tr></thead>\n",
|
||
"<thead><tr><th>int64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th></tr></thead>\n",
|
||
"<tr><td>4530738361793769600</td><td>2015.5</td><td>281.56725362448725</td><td>20.40682117430378</td><td>0.9785380604519425</td></tr>\n",
|
||
"<tr><td>4530752651135081216</td><td>2015.5</td><td>281.0861565355257</td><td>20.523350496351846</td><td>0.2674800612552977</td></tr>\n",
|
||
"<tr><td>4530743343951405568</td><td>2015.5</td><td>281.37114418299177</td><td>20.474147574053124</td><td>-0.43911323550176806</td></tr>\n",
|
||
"<tr><td>4530768456615026432</td><td>2015.5</td><td>281.8720921436347</td><td>20.31829694530366</td><td>-0.06900136127674149</td></tr>\n",
|
||
"<tr><td>4530763513119137280</td><td>2015.5</td><td>281.9211808864116</td><td>20.20956829578524</td><td>0.1266016679823622</td></tr>\n",
|
||
"<tr><td>4530736364618539264</td><td>2015.5</td><td>281.4913475613274</td><td>20.346579041327693</td><td>0.3894019486060072</td></tr>\n",
|
||
"<tr><td>4530735952305177728</td><td>2015.5</td><td>281.4085549165704</td><td>20.311030903719928</td><td>0.2041189982608354</td></tr>\n",
|
||
"<tr><td>4530751281056022656</td><td>2015.5</td><td>281.0585328377638</td><td>20.460309556214753</td><td>0.10294642821734962</td></tr>\n",
|
||
"<tr><td>4530740938774409344</td><td>2015.5</td><td>281.3762569536416</td><td>20.436140058941206</td><td>0.9242670062090182</td></tr>\n",
|
||
"<tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr>\n",
|
||
"<tr><td>4467710915011802624</td><td>2015.5</td><td>269.9680969307347</td><td>1.1429085038160882</td><td>0.42361471245557913</td></tr>\n",
|
||
"<tr><td>4467706551328679552</td><td>2015.5</td><td>270.033164589881</td><td>1.0565747323689927</td><td>0.922888231734588</td></tr>\n",
|
||
"<tr><td>4467712255037300096</td><td>2015.5</td><td>270.7724717923047</td><td>0.6581664892880896</td><td>-2.669179465293931</td></tr>\n",
|
||
"<tr><td>4467735001181761792</td><td>2015.5</td><td>270.3628606248308</td><td>0.8947079323599124</td><td>0.6117399163086398</td></tr>\n",
|
||
"<tr><td>4467737101421916672</td><td>2015.5</td><td>270.5110834661444</td><td>0.9806225910160181</td><td>-0.39818224846127004</td></tr>\n",
|
||
"<tr><td>4467707547757327488</td><td>2015.5</td><td>269.88746280594927</td><td>1.0212759940136962</td><td>0.7741412301054209</td></tr>\n",
|
||
"<tr><td>4467732772094573056</td><td>2015.5</td><td>270.55997182760126</td><td>0.9037072088489417</td><td>-1.7920417800164183</td></tr>\n",
|
||
"<tr><td>4467732355491087744</td><td>2015.5</td><td>270.6730790702491</td><td>0.9197224705139885</td><td>-0.3464446494840354</td></tr>\n",
|
||
"<tr><td>4467717099766944512</td><td>2015.5</td><td>270.57667173120825</td><td>0.726277659009568</td><td>0.05443955111134051</td></tr>\n",
|
||
"<tr><td>4467719058265781248</td><td>2015.5</td><td>270.7248052971514</td><td>0.8205551921782785</td><td>0.3733943917490343</td></tr>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<Table length=3000>\n",
|
||
" source_id ref_epoch ... dec parallax \n",
|
||
" yr ... deg mas \n",
|
||
" int64 float64 ... float64 float64 \n",
|
||
"------------------- --------- ... ------------------ --------------------\n",
|
||
"4530738361793769600 2015.5 ... 20.40682117430378 0.9785380604519425\n",
|
||
"4530752651135081216 2015.5 ... 20.523350496351846 0.2674800612552977\n",
|
||
"4530743343951405568 2015.5 ... 20.474147574053124 -0.43911323550176806\n",
|
||
"4530768456615026432 2015.5 ... 20.31829694530366 -0.06900136127674149\n",
|
||
"4530763513119137280 2015.5 ... 20.20956829578524 0.1266016679823622\n",
|
||
"4530736364618539264 2015.5 ... 20.346579041327693 0.3894019486060072\n",
|
||
"4530735952305177728 2015.5 ... 20.311030903719928 0.2041189982608354\n",
|
||
"4530751281056022656 2015.5 ... 20.460309556214753 0.10294642821734962\n",
|
||
"4530740938774409344 2015.5 ... 20.436140058941206 0.9242670062090182\n",
|
||
" ... ... ... ... ...\n",
|
||
"4467710915011802624 2015.5 ... 1.1429085038160882 0.42361471245557913\n",
|
||
"4467706551328679552 2015.5 ... 1.0565747323689927 0.922888231734588\n",
|
||
"4467712255037300096 2015.5 ... 0.6581664892880896 -2.669179465293931\n",
|
||
"4467735001181761792 2015.5 ... 0.8947079323599124 0.6117399163086398\n",
|
||
"4467737101421916672 2015.5 ... 0.9806225910160181 -0.39818224846127004\n",
|
||
"4467707547757327488 2015.5 ... 1.0212759940136962 0.7741412301054209\n",
|
||
"4467732772094573056 2015.5 ... 0.9037072088489417 -1.7920417800164183\n",
|
||
"4467732355491087744 2015.5 ... 0.9197224705139885 -0.3464446494840354\n",
|
||
"4467717099766944512 2015.5 ... 0.726277659009568 0.05443955111134051\n",
|
||
"4467719058265781248 2015.5 ... 0.8205551921782785 0.3733943917490343"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"results2 = job2.get_results()\n",
|
||
"results2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You might notice that some values of `parallax` are negative. As [this FAQ explains](https://www.cosmos.esa.int/web/gaia/archive-tips#negative%20parallax), \"Negative parallaxes are caused by errors in the observations.\" Negative parallaxes have \"no physical meaning,\" but they can be a \"useful diagnostic on the quality of the astrometric solution.\"\n",
|
||
"\n",
|
||
"Later we will see an example where we use `parallax` and `parallax_error` to identify stars where the distance estimate is likely to be inaccurate."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise:** The clauses in a query have to be in the right order. Go back and change the order of the clauses in `query2` and run it again. \n",
|
||
"\n",
|
||
"The query should fail, but notice that you don't get much useful debugging information. \n",
|
||
"\n",
|
||
"For this reason, developing and debugging ADQL queries can be really hard. A few suggestions that might help:\n",
|
||
"\n",
|
||
"* Whenever possible, start with a working query, either an example you find online or a query you have used in the past.\n",
|
||
"\n",
|
||
"* Make small changes and test each change before you continue.\n",
|
||
"\n",
|
||
"* While you are debugging, use `TOP` to limit the number of rows in the result. That will make each attempt run faster, which reduces your testing time. \n",
|
||
"\n",
|
||
"* Launching test queries synchronously might make them start faster, too."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Operators\n",
|
||
"\n",
|
||
"In a `WHERE` clause, you can use any of the [SQL comparison operators](https://www.w3schools.com/sql/sql_operators.asp); here are the most common ones:\n",
|
||
"\n",
|
||
"| Symbol | Operation\n",
|
||
"|--------| :---\n",
|
||
"| `>` | greater than\n",
|
||
"| `<` | less than\n",
|
||
"| `>=` | greater than or equal\n",
|
||
"| `<=` | less than or equal\n",
|
||
"| `=` | equal\n",
|
||
"| `!=` or `<>` | not equal\n",
|
||
"\n",
|
||
"Most of these are the same as Python, but some are not. In particular, notice that the equality operator is `=`, not `==`.\n",
|
||
"Be careful to keep your Python out of your ADQL!\n",
|
||
"\n",
|
||
"You can combine comparisons using the logical operators:\n",
|
||
"\n",
|
||
"* AND: true if both comparisons are true\n",
|
||
"* OR: true if either or both comparisons are true\n",
|
||
"\n",
|
||
"Finally, you can use `NOT` to invert the result of a comparison. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise:** [Read about SQL operators here](https://www.w3schools.com/sql/sql_operators.asp) and then modify the previous query to select rows where `bp_rp` is between `-0.75` and `2`.\n",
|
||
"\n",
|
||
"You can [read about this variable here](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"# This is what most people will probably do\n",
|
||
"\n",
|
||
"query = \"\"\"SELECT TOP 10\n",
|
||
"source_id, ref_epoch, ra, dec, parallax\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 1 \n",
|
||
" AND bp_rp > -0.75 AND bp_rp < 2\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"# But if someone notices the BETWEEN operator, \n",
|
||
"# they might do this\n",
|
||
"\n",
|
||
"query = \"\"\"SELECT TOP 10\n",
|
||
"source_id, ref_epoch, ra, dec, parallax\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 1 \n",
|
||
" AND bp_rp BETWEEN -0.75 AND 2\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This [Hertzsprung-Russell diagram](https://sci.esa.int/web/gaia/-/60198-gaia-hertzsprung-russell-diagram) shows the BP-RP color and luminosity of stars in the Gaia catalog.\n",
|
||
"\n",
|
||
"Selecting stars with `bp-rp` less than 2 excludes many [class M dwarf stars](https://xkcd.com/2360/), which are low temperature, low luminosity. A star like that at GD-1's distance would be hard to detect, so if it is detected, it it more likely to be in the foreground."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Cleaning up\n",
|
||
"\n",
|
||
"Asynchronous jobs have a `jobid`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(None, '1601903242219O')"
|
||
]
|
||
},
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"job1.jobid, job2.jobid"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Which you can use to remove the job from the server."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Removed jobs: '['1601903242219O']'.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"Gaia.remove_jobs([job2.jobid])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you don't remove it job from the server, it will be removed eventually, so don't feel too bad if you don't clean up after yourself."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Formatting queries\n",
|
||
"\n",
|
||
"So far the queries have been string \"literals\", meaning that the entire string is part of the program.\n",
|
||
"But writing queries yourself can be slow, repetitive, and error-prone.\n",
|
||
"\n",
|
||
"It is often a good idea to write Python code that assembles a query for you. One useful tool for that is the [string `format` method](https://www.w3schools.com/python/ref_string_format.asp).\n",
|
||
"\n",
|
||
"As an example, we'll divide the previous query into two parts; a list of column names and a \"base\" for the query that contains everything except the column names.\n",
|
||
"\n",
|
||
"Here's the list of columns we'll select. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"columns = 'source_id, ra, dec, pmra, pmdec, parallax, parallax_error, radial_velocity'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And here's the base; it's a string that contains at least one format specifier in curly brackets (braces)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query3_base = \"\"\"SELECT TOP 10 \n",
|
||
"{columns}\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 1\n",
|
||
" AND bp_rp BETWEEN -0.75 AND 2\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This base query contains one format specifier, `{columns}`, which is a placeholder for the list of column names we will provide.\n",
|
||
"\n",
|
||
"To assemble the query, we invoke `format` on the base string and provide a keyword argument that assigns a value to `columns`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query3 = query3_base.format(columns=columns)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The result is a string with line breaks. If you display it, the line breaks appear as `\\n`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'SELECT TOP 10 \\nsource_id, ra, dec, pmra, pmdec, parallax, parallax_error, radial_velocity\\nFROM gaiadr2.gaia_source\\nWHERE parallax < 1\\n AND bp_rp BETWEEN -0.75 AND 2\\n'"
|
||
]
|
||
},
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"query3"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"But if you print it, the line breaks appear as... line breaks."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"SELECT TOP 10 \n",
|
||
"source_id, ra, dec, pmra, pmdec, parallax, parallax_error, radial_velocity\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 1\n",
|
||
" AND bp_rp BETWEEN -0.75 AND 2\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(query3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Notice that the format specifier has been replaced with the value of `columns`.\n",
|
||
"\n",
|
||
"Let's run it and see if it works:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<Table length=10>\n",
|
||
" name dtype unit description n_bad\n",
|
||
"--------------- ------- -------- ------------------------------------------------------------------ -----\n",
|
||
" source_id int64 Unique source identifier (unique within a particular Data Release) 0\n",
|
||
" ra float64 deg Right ascension 0\n",
|
||
" dec float64 deg Declination 0\n",
|
||
" pmra float64 mas / yr Proper motion in right ascension direction 0\n",
|
||
" pmdec float64 mas / yr Proper motion in declination direction 0\n",
|
||
" parallax float64 mas Parallax 0\n",
|
||
" parallax_error float64 mas Standard error of parallax 0\n",
|
||
"radial_velocity float64 km / s Radial velocity 10\n",
|
||
"Jobid: None\n",
|
||
"Phase: COMPLETED\n",
|
||
"Owner: None\n",
|
||
"Output file: sync_20201005090726.xml.gz\n",
|
||
"Results: None\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"job3 = Gaia.launch_job(query3)\n",
|
||
"print(job3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<i>Table length=10</i>\n",
|
||
"<table id=\"table140265627700432\" class=\"table-striped table-bordered table-condensed\">\n",
|
||
"<thead><tr><th>source_id</th><th>ra</th><th>dec</th><th>pmra</th><th>pmdec</th><th>parallax</th><th>parallax_error</th><th>radial_velocity</th></tr></thead>\n",
|
||
"<thead><tr><th></th><th>deg</th><th>deg</th><th>mas / yr</th><th>mas / yr</th><th>mas</th><th>mas</th><th>km / s</th></tr></thead>\n",
|
||
"<thead><tr><th>int64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th><th>float64</th></tr></thead>\n",
|
||
"<tr><td>4467710915011802624</td><td>269.9680969307347</td><td>1.1429085038160882</td><td>2.0233280236600626</td><td>-2.5692427875510266</td><td>0.42361471245557913</td><td>0.470352406647465</td><td>--</td></tr>\n",
|
||
"<tr><td>4467706551328679552</td><td>270.033164589881</td><td>1.0565747323689927</td><td>-3.414829591355289</td><td>-3.8437215857495737</td><td>0.922888231734588</td><td>0.927008559859825</td><td>--</td></tr>\n",
|
||
"<tr><td>4467712255037300096</td><td>270.7724717923047</td><td>0.6581664892880896</td><td>-3.5620173752896025</td><td>-6.595792323153987</td><td>-2.669179465293931</td><td>0.9719742773203504</td><td>--</td></tr>\n",
|
||
"<tr><td>4467735001181761792</td><td>270.3628606248308</td><td>0.8947079323599124</td><td>2.1307079926489205</td><td>0.8826727710910712</td><td>0.6117399163086398</td><td>0.509812721702093</td><td>--</td></tr>\n",
|
||
"<tr><td>4467737101421916672</td><td>270.5110834661444</td><td>0.9806225910160181</td><td>0.17532366511560785</td><td>-5.113270239706202</td><td>-0.39818224846127004</td><td>0.7549581886719651</td><td>--</td></tr>\n",
|
||
"<tr><td>4467707547757327488</td><td>269.88746280594927</td><td>1.0212759940136962</td><td>-2.6382230817672987</td><td>-3.707776532049287</td><td>0.7741412301054209</td><td>0.3022057897812064</td><td>--</td></tr>\n",
|
||
"<tr><td>4467732355491087744</td><td>270.6730790702491</td><td>0.9197224705139885</td><td>-2.2735991502653037</td><td>-11.864952855984358</td><td>-0.3464446494840354</td><td>0.4937921513912002</td><td>--</td></tr>\n",
|
||
"<tr><td>4467717099766944512</td><td>270.57667173120825</td><td>0.726277659009568</td><td>-3.4598362614808367</td><td>-4.601426893365921</td><td>0.05443955111134051</td><td>0.8867339293525688</td><td>--</td></tr>\n",
|
||
"<tr><td>4467719058265781248</td><td>270.7248052971514</td><td>0.8205551921782785</td><td>-3.255079498426542</td><td>-9.249285069111085</td><td>0.3733943917490343</td><td>0.390952370410666</td><td>--</td></tr>\n",
|
||
"<tr><td>4467722326741572352</td><td>270.87431291888504</td><td>0.8595565975869158</td><td>0.10696398351859826</td><td>1.2035993780158853</td><td>-0.11850943432864373</td><td>0.1660452431882023</td><td>--</td></tr>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<Table length=10>\n",
|
||
" source_id ra ... parallax_error radial_velocity\n",
|
||
" deg ... mas km / s \n",
|
||
" int64 float64 ... float64 float64 \n",
|
||
"------------------- ------------------ ... ------------------ ---------------\n",
|
||
"4467710915011802624 269.9680969307347 ... 0.470352406647465 --\n",
|
||
"4467706551328679552 270.033164589881 ... 0.927008559859825 --\n",
|
||
"4467712255037300096 270.7724717923047 ... 0.9719742773203504 --\n",
|
||
"4467735001181761792 270.3628606248308 ... 0.509812721702093 --\n",
|
||
"4467737101421916672 270.5110834661444 ... 0.7549581886719651 --\n",
|
||
"4467707547757327488 269.88746280594927 ... 0.3022057897812064 --\n",
|
||
"4467732355491087744 270.6730790702491 ... 0.4937921513912002 --\n",
|
||
"4467717099766944512 270.57667173120825 ... 0.8867339293525688 --\n",
|
||
"4467719058265781248 270.7248052971514 ... 0.390952370410666 --\n",
|
||
"4467722326741572352 270.87431291888504 ... 0.1660452431882023 --"
|
||
]
|
||
},
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"results3 = job3.get_results()\n",
|
||
"results3"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Good so far."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise:** This query always selects sources with `parallax` less than 1. But suppose you want to take that upper bound as an input.\n",
|
||
"\n",
|
||
"Modify `query3_base` to replace `1` with a format specifier like `{max_parallax}`. Now, when you call `format`, add a keyword argument that assigns a value to `max_parallax`, and confirm that the format specifier gets replaced with the value you provide."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"query4_base = \"\"\"SELECT TOP 10\n",
|
||
"{columns}\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < {max_parallax} AND \n",
|
||
"bp_rp BETWEEN -0.75 AND 2\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"SELECT TOP 10\n",
|
||
"source_id, ra, dec, pmra, pmdec, parallax, parallax_error, radial_velocity\n",
|
||
"FROM gaiadr2.gaia_source\n",
|
||
"WHERE parallax < 0.5 AND \n",
|
||
"bp_rp BETWEEN -0.75 AND 2\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Solution\n",
|
||
"\n",
|
||
"query4 = query4_base.format(columns=columns,\n",
|
||
" max_parallax=0.5)\n",
|
||
"print(query)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Style note:** You might notice that the variable names in this notebook are numbered, like `query1`, `query2`, etc. \n",
|
||
"\n",
|
||
"The advantage of this style is that it isolates each section of the notebook from the others, so if you go back and run the cells out of order, it's less likely that you will get unexpected interactions.\n",
|
||
"\n",
|
||
"A drawback of this style is that it can be a nuisance to update the notebook if you add, remove, or reorder a section.\n",
|
||
"\n",
|
||
"What do you think of this choice? Are there alternatives you prefer?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Summary\n",
|
||
"\n",
|
||
"This notebook demonstrates the following steps:\n",
|
||
"\n",
|
||
"1. Making a connection to the Gaia server,\n",
|
||
"\n",
|
||
"2. Exploring information about the database and the tables it contains,\n",
|
||
"\n",
|
||
"3. Writing a query and sending it to the server, and finally\n",
|
||
"\n",
|
||
"4. Downloading the response from the server as an Astropy `Table`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Best practices\n",
|
||
"\n",
|
||
"* If you can't download an entire dataset (or it's not practical) use queries to select the data you need.\n",
|
||
"\n",
|
||
"* Read the metadata and the documentation to make sure you understand the tables, their columns, and what they mean.\n",
|
||
"\n",
|
||
"* Develop queries incrementally: start with something simple, test it, and add a little bit at a time.\n",
|
||
"\n",
|
||
"* Use ADQL features like `TOP` and `COUNT` to test before you run a query that might return a lot of data.\n",
|
||
"\n",
|
||
"* If you know your query will return fewer than 3000 rows, you can run it synchronously, which might complete faster (but it doesn't seem to make much difference). If it might return more than 3000 rows, you should run it asynchronously.\n",
|
||
"\n",
|
||
"* ADQL and SQL are not case-sensitive, so you don't have to capitalize the keywords, but you should.\n",
|
||
"\n",
|
||
"* ADQL and SQL don't require you to break a query into multiple lines, but you should.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Jupyter notebooks can be good for developing and testing code, but they have some drawbacks. In particular, if you run the cells out of order, you might find that variables don't have the values you expect.\n",
|
||
"\n",
|
||
"There are a few things you can do to mitigate these problems:\n",
|
||
"\n",
|
||
"* Make each section of the notebook self-contained. Try not to use the same variable name in more than one section.\n",
|
||
"\n",
|
||
"* Keep notebooks short. Look for places where you can break your analysis into phases with one notebook per phase."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "raw",
|
||
"metadata": {},
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.8.5"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|