For files with "RefSeq" in the file name, coding sequences were parsed from GenBank format files (*genomic.gbff.gz) downloaded from NCBI (https://ftp.ncbi.nlm.nih.gov/genomes/refseq/invertebrate/). The AR number indicates the RefSeq annotation release. Coding sequences for Melipona quadrifasciata were parsed in the same way from a file downloaded from GenBank (https://ftp.ncbi.nlm.nih.gov/genomes/genbank/invertebrate/). The files for Cardiocondyla obscurior and Lasioglossum albipes are from Official Gene Sets. The definition lines for all sequences were formatted to show the transcript id, gene id and protein id, in that order.