Calcul molecular mass protein

Estimate the molecular mass of a protein from its amino acid sequence using average or monoisotopic residue masses. The calculator automatically cleans the sequence, counts residues, adds terminal water, and visualizes amino acid composition.

Protein sequence

Mass type

Output unit

Oligomer copy number

Decimal places

Sequence cleaning mode

Method: sum of amino acid residue masses plus one water molecule for the intact polypeptide termini. This is a theoretical mass and does not include post-translational modifications unless you add them manually outside this basic model.

Results

Enter a valid sequence and click calculate to see protein mass, residue count, composition, and a chart.

Expert guide to calcul molecular mass protein

The phrase calcul molecular mass protein refers to the calculation of the theoretical molecular mass of a protein from its amino acid sequence. In practical biochemistry, proteomics, molecular biology, structural biology, and pharmaceutical development, this value is foundational. It influences how researchers interpret electrophoresis bands, mass spectrometry peaks, chromatographic retention, stoichiometry, molecular assembly, and biophysical assays. While many scientists use a simple shortcut of approximately 110 Da per amino acid residue, exact sequence-based calculation gives a much more accurate value and is preferred whenever precision matters.

A protein is built from amino acids linked by peptide bonds. If you sum the masses of the free amino acids directly, you would slightly overestimate the final molecular mass because each peptide bond forms with the loss of one molecule of water. For that reason, calculators typically use amino acid residue masses, not free amino acid masses. Residue masses already account for peptide bond formation, and then one water molecule is added back for the intact N- and C-termini of the full polypeptide chain. That is exactly the approach used in the calculator above.

Why protein molecular mass matters

Knowing the theoretical molecular mass of a protein supports decisions throughout an experiment. If you run a recombinant protein on SDS-PAGE and expect a 28.4 kDa band, but observe a band near 35 kDa, that difference may suggest glycosylation, fusion tags, dimerization, incomplete reduction, anomalous electrophoretic mobility, or even degradation products. In mass spectrometry, comparing measured and theoretical intact mass can reveal post-translational modifications such as phosphorylation, oxidation, acetylation, or disulfide bond status. In structural biology, molecular mass helps estimate oligomeric state from SEC-MALS or native MS. In formulation and biopharma settings, accurate mass also underpins identity testing and quality control.

Sequence-based mass calculation is especially important when proteins are engineered. A single residue substitution can shift mass enough to be detected by modern mass spectrometers. An affinity tag such as His-tag, FLAG, HA, GST, or MBP may add several hundred to tens of thousands of Daltons. Protease cleavage sites alter the expected post-processing product. Signal peptides are often removed in mature secreted proteins, so the translated precursor mass may not match the mature extracellular protein mass. Because of these common real-world complications, the best workflow is to compute the exact sequence mass for the construct and mature form you truly expect to analyze.

How the calculation works

The standard sequence-based method is simple in principle:

Clean the sequence and keep only valid one-letter amino acid codes.
Count each residue in the sequence.
Multiply each residue count by its residue mass.
Sum all residue masses.
Add the mass of one water molecule to represent the complete termini of the intact chain.
If the biologically relevant species is a homooligomer, multiply by the copy number.

There are two common mass conventions. Average mass uses isotope-averaged atomic masses and is often suitable for general biochemical work. Monoisotopic mass uses the exact mass of the most abundant isotope of each atom and is especially useful in high-resolution mass spectrometry. The difference between the two can be meaningful in accurate MS work, especially for peptides and smaller proteins.

Average residue masses used in many calculators

The values below are representative average residue masses for amino acids within a polypeptide chain. They are not the masses of the free amino acids in solution. These residue values are what make direct protein mass calculation practical.

Amino acid	One-letter code	Average residue mass (Da)	Monoisotopic residue mass (Da)
Alanine	A	71.0788	71.03711
Arginine	R	156.1875	156.10111
Asparagine	N	114.1038	114.04293
Aspartic acid	D	115.0886	115.02694
Cysteine	C	103.1388	103.00919
Glutamic acid	E	129.1155	129.04259
Glutamine	Q	128.1307	128.05858
Glycine	G	57.0519	57.02146
Histidine	H	137.1411	137.05891
Isoleucine	I	113.1594	113.08406
Leucine	L	113.1594	113.08406
Lysine	K	128.1741	128.09496
Methionine	M	131.1926	131.04049
Phenylalanine	F	147.1766	147.06841
Proline	P	97.1167	97.05276
Serine	S	87.0782	87.03203
Threonine	T	101.1051	101.04768
Tryptophan	W	186.2132	186.07931
Tyrosine	Y	163.1760	163.06333
Valine	V	99.1326	99.06841

One water molecule is then added to the summed residue masses. Average water mass is commonly taken as 18.01528 Da, and monoisotopic water mass as 18.01056 Da. This final step gives the mass of the complete, unmodified polypeptide chain.

Rule-of-thumb versus exact calculation

A common shorthand estimate is:

Protein mass ≈ number of residues × 110 Da

This is often good enough for rough planning, but sequence composition matters. A glycine-rich protein can be lighter than expected, whereas proteins enriched in tryptophan, arginine, tyrosine, or methionine can be heavier than a simple 110 Da estimate. As proteins get larger, a rough estimate remains useful for quick mental math, but exact sequence calculation is still the better choice before spending time or money on analytical work.

Protein example	Approximate residue count	Typical molecular mass	Notes
Insulin (mature human)	51 aa total in A and B chains	About 5.8 kDa	Small hormone with disulfide bonds
Lysozyme	129 aa	About 14.3 kDa	Classic enzyme used in teaching and structural biology
Green fluorescent protein	238 aa	About 26.9 kDa	Widely used reporter protein
Human serum albumin	585 aa	About 66.5 kDa	Major plasma protein
Hemoglobin tetramer	574 aa total across 4 subunits	About 64.5 kDa	Functional mass reflects oligomeric assembly
Immunoglobulin G	About 1320 aa total	About 150 kDa	Glycosylation increases observed mass complexity

Important sources of discrepancy between theoretical and observed mass

Post-translational modifications: Phosphorylation, glycosylation, acetylation, methylation, ubiquitination, sulfation, lipidation, and oxidation can all shift mass.
Disulfide bonding: Disulfide formation changes the hydrogen count and therefore the exact mass slightly, though practical interpretation often focuses more on redox state and mobility effects.
Signal peptide removal: Secreted and membrane proteins are often processed after translation.
N-terminal methionine processing: The initiating methionine may be removed depending on the second residue and cellular context.
Affinity tags and cloning scars: Small additions from vectors frequently explain band shifts.
Proteolysis or truncation: Sample handling can generate smaller fragments than expected.
SDS-PAGE anomaly: Migration on gels is not always perfectly proportional to true molecular mass.

How to use molecular mass in experimental workflows

In mass spectrometry, your calculated monoisotopic mass is the best first reference for intact protein or peptide analysis. For electrospray ionization, observed m/z peaks correspond to charge states, but deconvolution should land near the true intact mass if the sequence and modification state are correct. In SDS-PAGE, the exact mass helps annotate lanes and identify unexpected shifts. In size-exclusion chromatography, mass provides context, though shape and hydration influence elution strongly. In recombinant protein purification, knowing the exact expected mass for each construct version is essential when comparing uncleaved fusion, cleaved target, and degradation products.

For oligomeric proteins, monomer mass is only part of the story. Many proteins function as dimers, trimers, tetramers, or larger assemblies. If a monomer is 32.1 kDa and the native species is a homotetramer, the assembly is approximately 128.4 kDa before considering ligand binding or PTMs. The calculator above includes an oligomer copy number field for that reason. This is useful when interpreting native gels, SEC-MALS, and quaternary structure measurements.

Best practices for accurate protein mass calculation

Use the mature, experimentally relevant sequence, not only the translated ORF.
Decide whether average or monoisotopic mass fits your instrument and workflow.
Include tags, linkers, cleavage remnants, and engineered mutations.
Account for known post-translational modifications separately.
Check whether your protein forms disulfides, multimers, or processed fragments.
Compare calculated mass with multiple experimental methods when possible.

Authoritative references for protein mass and sequence analysis

For deeper study, consult authoritative educational and government resources. The National Human Genome Research Institute provides accessible molecular biology background at genome.gov. The National Center for Biotechnology Information offers sequence and protein resources through ncbi.nlm.nih.gov. For proteomics and mass spectrometry education, the University of Arizona mass spectrometry resource provides helpful academic material at proteomics.arizona.edu.

Final takeaway

Calculating protein molecular mass is one of the most useful first-pass analytical steps in protein science. It turns a raw sequence into an experimentally actionable expectation. The exact calculation is superior to rough estimation because proteins vary in amino acid composition, may assemble into oligomers, and often undergo processing. By using sequence-specific residue masses, adding terminal water, and selecting average or monoisotopic mode according to your application, you obtain a reliable theoretical mass that supports interpretation across electrophoresis, mass spectrometry, purification, and structural characterization.

Educational note: theoretical molecular mass is not a substitute for direct measurement when PTMs, proteolysis, or heterogeneous glycosylation are present. In those cases, use intact mass spectrometry or orthogonal biophysical methods to confirm the true molecular species.

Calcul Molecular Mass Protein