Gene ID mapping

With this tool you can map gene symbols in an Excel/CSV/TSV table to Human gene identifiers.
The output will contain columns with gene IDs and the respective official gene symbol. It can be directly used as input for the GOAT online tool (it'll only use rows where mapping was successful).

Your input data

drag&drop a gene list file or click to open a file dialog

Importantly, your input gene list / dataset / gene table must be prepared in a format that is compatible with this tool.

File format: either CSV, TSV, or Excel (.xlsx file, data on the first sheet)
Required column: symbol (column name must match exactly)

The documentation below shows an example table that one might use as input.

status:

no gene list loaded yet

Settings

Rows where the gene symbol column contains a delimiter (a semicolon, comma or whitespace) are assumed to refer to multiple genes. If there are more than two unique gene symbols (on a row), how should these be mapped to human gene identifiers?

try to map only the first gene symbol

use the first gene symbol that can be successfully mapped

analogous to above, but skip if there exists a non-ambiguous row that contains the first (successfully mapped) gene symbol

skip rows with ambiguous symbols altogether

Results

No results yet; load your gene list and press START...

Settings documentation

To illustrate the problem of ambiguous genes/symbols and solutions offered by above options;

symbol	effectsize	note
GRIA1	1.0	protein group maps to exactly 1 gene
GRIA2	1.0	protein group maps to exactly 1 gene
GRIA1;GRIA2	1.5	ambiguous, one might want to use only the first entry ('leading' gene)
GRIA3;GRIA4	1.5	ambiguous, but this row contributes a new gene (GRIA3)
tr\|A8K0K0\|A8K0K0_HUMAN;GRIA2;GRIA3	2.0	ambiguous, but the first entry has no gene symbol only an accession

With option 1, "try to map only the first gene symbol"
the output will contain a gene ID for all rows except the last (respectively, GRIA1, GRIA2, GRIA1, GRIA3, -).
With option 2, "use the first gene symbol that can be successfully mapped"
all rows will be mapped to a gene ID (respectively, GRIA1, GRIA2, GRIA1, GRIA3, GRIA2).
With option 3, "analogous to above, but skip if there exist a non-ambiguous row that contains the first (successfully mapped) gene symbol"
rows 3 and 5 are skipped because there exist unambiguous entries for GRIA1 and GRIA2 (respectively, GRIA1, GRIA2, -, GRIA3, -). This approach favors rows that are unambiguous and supplements this only with ambiguous rows that contribute new information (genes).
With option 4, "skip rows with ambiguous symbols altogether"
only the first 2 rows are mapped (respectively, GRIA1, GRIA2, -, -, -).

When choosing the option most appropriate for your dataset, keep in mind that the gene set analysis in GOAT online will retain only 1 row per unique gene. If multiple rows/entries are available for a gene, the one with the lowest/best p-value is retained. If there are no gene p-values in your data/table, the best absolute effect size is retained (across multiple entries for the same gene).

What data is used / how is gene ID mapping done?

We created a lookup table using official gene symbols and aliases/synonyms based on information provided by HGNC, it is stored on this webserver. Any synonyms that are listed by HGNC as entries for multiple genes are considered ambiguous and are discarded.

This tool downloads the lookup table (mapping from symbols to gene IDs) to your computer and then proceeds with matching between your input table and HGNC gene information. So your table/data never leaves your computer and remains private at all times.