Extract VCFs
Introduction
Sometimes, although you have a huge whole-genome VCF file with a lot of samples, you just want to analyze only some samples or you want to focus on some chromosomes of the genome. Because of that, we have added a tool to extract features from a VCF to OmicsBox.
Extract VCF
The tool to extract features from a VCF can be found in the Genetic Variation Module of OmicsBox under VCF Tools → Extract from VCF.The wizard consists of two pages and allows you to define the input and output options as well as different options to extract information from VCF files (Figure 1, Figure 2).
Input
In the first page you will be able to select the input file and what information you want to extract.
- VCF File: select the VCF file from which to extract information.
-
Features to extract by:
-
Samples: choose this option if you intend to generate a VCF file with the same number of variants as the original one but with only certain samples.
- Chromosomes: select this option if you want to create a VCF file with all the samples as the original file but with variants from selected chromosomes.
- Features to extract: select the samples/chromosomes that you want to extract.
- Get Also Unselected Features: select this option if you want to obtain not only a VCF file with the selected features but also another one with the opposite subset of features.
The last option might be interesting in the case that you have a lot of chromosomes/samples and you want to extract the majority of them. As selecting all of them might be a bit arduous, you can select the features you are not interested in and then check this option.
Output
- VCF File: specify where to save the VCF file with the extracted features.
- Complementary VCF File: specify where to save the VCF file with the complementary set of features.
Summary Report
Appart from the VCF file(s), a Summary Report will appear. This report will have the following information:
- Input Data: file names of all the VCF used as input.
-
Summary Information:
-
Types of Variants: frequency of the different types of variants.
- Number of alleles in a variant: abundance of the alleles per variant.
- Statistics: number of total variants, number of total genotypes, number of heterozygotes and missing data.