Skip to content

Merge VCFs

Introduction

Due to the high volume of data that Genetic Variation experiments handle nowadays, a tool to merge different VCF files has become necessary in OmicsBox. With this utility you will be able to seamlessly combine diverse VCF (Variant Call Format) files. This tool might be useful when you need to consolidate genetic variant information from multiple sources, such as various experiments or datasets. By merging different VCF files you will find easier to analyze and interpret genetic variation data.

Merge VCFs

The tool to merge VCFs can be found in the Genetic Variation Module of OmicsBox under VCF Tools → Merge VCFs.The wizard consists of two pages and allows you to define the input and output options as well as different options to merge VCF files (Figure 1, Figure 2).

Input

In the first page you will be able to select the input files and how to merge files.

  • VCF Files: select VCF files to merge in one single VCF file.
  • How to merge files:

  • By sample: choose this option if you intend to generate a unified VCF file by combining samples from various VCF files, each containing different subsets of samples.

  • By chromosome: select this option if you want to create a single VCF file from multiple VCF files with different chromosomes.

The first option could be appealing for generating a single VCF file in scenarios where, for instance, you have executed a Variant Calling job with certain samples and subsequently you obtained another VCF file containing a distinct set of samples from the identical dataset.

image-20240103-113629.png

Figure 1. Input Page

Output

  • VCF File: specify the folder where you want to save the final VCF File.

image-20240103-113649.png

Figure 2. Output Page

Summary Report

Appart from a VCF file that is the result of the merge of all the input VCF files, a Summary Report will appear. This report will have the following information:

  • Input Data: file names of all the VCF used as input.
  • Summary Information:

  • Types of Variants: frequency of the different types of variants.

  • Number of alleles in a variant: abundance of the alleles per variant.
  • Statistics: number of total variants, number of total genotypes, number of heterozygotes and missing data.

image-20240314-151509.png