malva.complexity module

malva.complexity.overlapping_windows(sequence, L)[source]

Returns overlapping windows of size L from sequence sequence :type sequence: :param sequence: the nucleotide or protein sequence to scan over :type L: :param L: the length of the windows to yield

malva.complexity.compute_rep_vector(sequence, N)[source]

Computes the repetition vector (as seen in Wooton, 1993) from a given sequence of a biopolymer with N possible residues.

Parameters:
  • sequence – the nucleotide or protein sequence to generate a repetition vector for.

  • N – the total number of possible residues in the biopolymer sequence belongs to.

malva.complexity.complexity(sequence, N)[source]

Computes the Shannon Entropy of a given sequence of a biopolymer with N possible residues. See (Wooton, 1993) for more.

Parameters:
  • sequence – the nucleotide or protein sequence whose Shannon Entropy is to calculated.

  • N – the total number of possible residues in the biopolymer sequence belongs to.

malva.complexity.mask_low_complexity(seq_rec, maskchar='N', N=20, L=12)[source]

Masks low-complexity nucleic/amino acid sequences with a given mask character.

Parameters:
  • seq_rec – a string

  • maskchar – Character to mask low-complexity residues with.

  • N – Number of residues to expect in the sequence. (20 for AA, 4 for DNA)

  • L – Length of sliding window that reads the sequence.