Skip to content Skip to sidebar Skip to footer

Biopython Alignio Valueerror Says Strings Must Be Same Length?

Input fasta-format text file: http://www.jcvi.org/cgi-bin/tigrfams/DownloadFile.cgi?file=/opt/www/www_tmp/tigrfams/fa_alignment_PF00205.txt #!/usr/bin/python from Bio import Align

Solution 1:

Pad the sequence that is too short and write the records to to a temporary FASTA file. Than your alignments works as expected:

from Bio import AlignIO
from Bio import SeqIO
from Bio import Seq
import os

input_file = '/path/to/fa_alignment_PF00205.txt'
records = SeqIO.parse(input_file, 'fasta')
records = list(records) # make a copy, otherwise our generator# is exhausted after calculating maxlen
maxlen = max(len(record.seq) for record in records)

# pad sequences so that they all have the same lengthfor record in records:
    iflen(record.seq) != maxlen:
        sequence = str(record.seq).ljust(maxlen, '.')
        record.seq = Seq.Seq(sequence)
assertall(len(record.seq) == maxlen for record in records)

# write to temporary file and do alignment
output_file = '{}_padded.fasta'.format(os.path.splitext(input_file)[0])
withopen(output_file, 'w') as f:
    SeqIO.write(records, f, 'fasta')
alignment = AlignIO.read(output_file, "fasta")
print alignment

This outputs:

SingleLetterAlphabet() alignment with 104 rows and 275 columns
TKAAIELIADHQ.......LTVLADLLVHRLQ..AVKELEALLA...QALSP|A2VGF0.1/208-339LQELASVINQHE...KV..MLFCGHGCR...Y..AVEEVMALAK...EDLSP|A3D4X6.1/190-319IKKIAQAIEKAK...KP..VICAGGGVINS.N..ASEELLTLSR...KELSP|A3DID9.1/192-327IDEAAEAINKAE...RP..VILAGGGVSIA.G..ANKELFEFAT...QLLSP|A3DIY4.1/192-327IEKAIELINSSQ...RP..FICSGGGVISS.E..ASEELIQFAE...KILSP|A4XHS0.1/191-326IKRAVEAIENSQ...RP..VICSGGGVIAS.R..ASDELKILVE...SEISP|A4XIL5.1/194-328VRQAARIIMESE...RP..VIYAGGGVRIS.G..AAPELLELSE...RALSP|A5D4V9.1/192-327LQALAQRILRAQ...RP..VIITGDEIVKS.D..ALQAAADFAS...LQLSP|A5ECG1.1/192-328VEKAVELLWSAR...RV..LVISGRGAR...G..AGPELIGLLD...RAMSP|A5EDH4.1/198-324IQKAARLIETAE...KP..VIIAGHGVNIS.G..ANEELKTLAE...KSLSP|A5FR34.1/193-328LDALARDLDSAA...RV..TIYAGIGAR...G..AAARVVQLAG...EALSP|A5FTR0.1/189-317VADVAALLRAAR...RP..VIVAGGGVIHSG...AEERLATFAA...DALSP|A5G0X6.1/217-351IAEAVSALKGAK...RP..IIYTGGGLINS.GPESAELIVQLAK...RALSP|A5G2E1.1/199-336LKKAAEIINRAK...RP..LIYAGGGITLA.G..ASAELRALAA...ALLSP|A5GC69.1/192-327CRDIVGKLLQSH...RP..VVLGGTGVRLS.R..TEQRLLALVE...DVFSP|A5W0I1.1/200-336LDQAALKLAAAE...RP..MIIAGGGA..L.H..AAEQLAQLSA...AGLSP|A5W220.1/196-326LQRAADILNTGH...KV..AILVGAGAL...Q..ATEQVIAIAE...RALSP|A5W364.1/198-328IRKAAEMLLAAK...RP..VVYSGGGVILG.G..GSEALTEIAK...SEMSP|A5W954.1/196-331...LTELQERLANAQ...RP..VVILGGSRWSD.A..AVQQFTRFAE...... SP|Q220C3.1/190-328

Solution 2:

your problem is last record of fasta ... tail -9 fa_alignment_PF00205.txt

>SP|Q21VK8.1/229-357
LQAALAALAKAE...RP..LLVIGSQALVLSK..QAEHLAEAVARL.GIPV.YLSGMA..RGLLG.R..........DH.
...............PLQ..................MRHQRRQALRE..ADCVLLAG.VP...CDFRLD......YGKHV
RR..............S.AT.........L..IAA.N......................RSA.........KDARLNR..
.......K...PD.IAAIGDAG.......LFLQAL
>SP|Q220C3.1/190-328
LTELQERLANAQ...RP..VVILGGSRWSD.A..AVQQFTRFAEAF.SLPV.FCSFRR..QMLFS.A..........NH.
...............ACY...AG.DLGLG.A.....NQRLLARI.RQ..SDLILLLG.GR...MSEVPS......QGYEL
LGIPAPQQ...........D

Sequence with id SP|Q220C3.1/190-328 has different length than other sequences

Post a Comment for "Biopython Alignio Valueerror Says Strings Must Be Same Length?"