Insertion sequences (IS) are small, transposable elements, commonly found in bacterial genomes. IS are amongst the more dynamic parts of the bacterial genome and can drive the evolution of bacteria in a variety of ways including: functional changes to the genome (interrupting or up-regulating genes); mediating deletions of genes; or creating structural variation. The Shigella sonnei reference genome 53G has over 200 different IS insertion sites. The more variable IS include IS1, IS2, IS4, IS21, IS600, IS630, ISEc20 and ISSo4. Population genomics of S. sonnei have been previously studied, however IS variation is often ignored as these regions are difficult to extract from Illumina short read data.
This study aims to investigate IS variation in a global collection of 133 S. sonnei isolates consisting of three different lineages. We developed a new bioinformatic tool, ISMapper (https://github.com/jhawkey/IS_mapper), which can be used to detect IS from short read data. Using ISMapper, we screened for the presence of IS1, IS2, IS4, IS21, IS600, IS630, ISEc20 and ISSo4 in all isolates.
Across lineages, we found that each IS displayed extensive variation in copy number between isolates, and more recent lineages had a higher copy number. Each IS often creates a unique insertion pattern for each lineage, which could be used to identify the lineages. Additionally, we detected hotspots of IS insertion around the S. sonnei genome, including regions surrounding genes involved in type IV secretion or in vitamin B transport, as well as frequent interruptions of genes encoding phage integrase and tail proteins.
These results show that IS are extremely dynamic elements that are important to the plasticity of the S. sonnei genome, and examining IS elements allows interesting insights into the evolution of pathogens.