Withdraw
Loading…
Tracing history of protein domain organization, functional loops, and dipeptides with chronologies and networks
Aziz, Muhammad Fayez
Loading…
Permalink
https://hdl.handle.net/2142/115913
Description
- Title
- Tracing history of protein domain organization, functional loops, and dipeptides with chronologies and networks
- Author(s)
- Aziz, Muhammad Fayez
- Issue Date
- 2022-07-14
- Director of Research (if dissertation) or Advisor (if thesis)
- Rodriguez Zas, Sandra L
- Doctoral Committee Chair(s)
- Caetano-Anollés, Gustavo
- Committee Member(s)
- Bhalerao, Kaustubh
- Villamil, Maria B
- Department of Study
- Crop Sciences
- Discipline
- Crop Sciences
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Protein Domain Organization
- Elementary Loops
- Dipeptides
- Fold Superfamilies
- Fold Families
- Phylogeny
- Evolutionary Age
- Networks
- connectivity
- Scale-Free
- Scale-Rich
- Modularity, Hierarchical Modularity, Protein Model
- Abstract
- Systems biology is a multidisciplinary, holistic, and rising field that uses computational and mathematical approaches to model and explore the makeup and functioning of biological systems. Analysis of high throughput data with statistical approaches help create in silico models that advance our understanding of complex biological phenomena, with illuminating findings in human development and disease. Also, drawing links between biochemistry and physiology determine how functions take place in biological interactions. We here focus on the mysterious origin and evolution of proteins in their various forms. A hierarchical classification of proteins based on homology of structural domains describes the extant shared-and-derived features of the sequence, structure, and function of protein modules. Modern advances in evolutionary genomics and systems biology enable the historical exploration of the structure, function, and organization of proteins. The history of a biological system can be traced through a series of time events, which define molecular chronologies (timelines) describing the rise of biological innovations. Overlaying these histories on time-varying networks provide a window into the past. Phylogenomic analyses allow construction of molecular timelines of evolving networks, which can be linked to the geological fossil record using a molecular clock of folds. These evolutionary timelines of domains (‘lock-and-key’ folding components of proteins) reveal the explosive emergence of multidomain proteins, yin-and-yang patterns of ‘elementary functionome’ of loops and domains (threads uniting domains) development, and a late rise of translational proteins and dipeptides (pairs of amino acids making up domains). Growing networks of these primordial biological entities describe how parts associate with each other to form integrated systems, which are often structured by modularity, power law or a modular hierarchy (i.e., both). A ~2,000-year-old papyrus attributed to Empedocles from the ancient city of Panopolis in Upper Egypt, P. Strasb. Gr. Inv. 1665-6, recounts a ‘double tale’ of unification and change. This scribal transmission can astonishingly be interpreted as a description of biological evolution with network hierarchies ~2,400 years before Darwin. We discover that the double tale portrays a biphasic (hourglass) theory of module emergence. A first phase links parts weakly, which then associate variously. Later, these parts diversify and compete, and are often selected for performance. The emerging interactions constrain their associative structure, which causes parts to self-organize into modules with tight linkages. In a second phase, variants of the modules diversify and become parts for a new generative cycle of higher-level organization. This paradigm predicts the rise of hierarchical modularity in evolving networks at different timescales and complexity levels. Remarkably, phylogenomic analyses uncover this emergence in the evolving networks of protein domain organization, elementary functionome and dipeptides. We use computational techniques to mine historical developments and observe how 1,937 functional loop prototypes and 400 structural elements of dipeptides define function and modular behavior of 1,442, 1,475 and 6,162 protein domains. Evolutionary developments of protein domains describe macro level structural evolution, as protein sizes may be different among species, but their active sites can be similar. The crystal structures of protein architectures are highly conserved and thus the patterns of historical interactions among these signified the co-option and recruitment of functional modules throughout protein history. Timelines built using modern methods of phylogeny reconstruction and mapped over evolving networks of domain structures and architectures described the early emergence of domains involving sandwich, bundle, barrel, and other complex structures, and a late ‘big bang’ of domain combinations through evolutionary processes of fusion and fission. Visualizations as radial and waterfall layouts of timed networks of 6,162 protein domain and multi-domain superfamilies identified the development of domain clusters and significant hubs involved in determining affinity relationships of protein domains in the genomes. The network analyses made evident the scale free and modular behaviors of protein domain organization as proteins evolve through unique events in molecular history. Similarly, the bipartite and waterfall visualizations of the bipartite and projected networks of 1,937 loop prototypes and associated protein fold families described the sharing topology and evolution of an elementary functionome. We investigated how loops impart a particular function to domains and the diversity of domain function is represented by the variety of loops contained therein and shared with other domain structures. The bimodal networks and their monomodal network projections helped identify the hierarchical modularity of loop-domain, domain-loop, loop-loop, and domain-domain interactions that might be very ancient in nature and have been conserved throughout modern evolution. These networks hence helped to elicit the origin and evolution of structural domains and modular and non-modular loops through ‘waves’ of recruitment phases, promising strong genomic and structural evidence, with a potential to model the gradual evolution of proteins, as we illustrated with AlphFold2-modelled P-loop hydrolases. Lastly, we study the 400 dipeptides and associated protein domain classes and genomes to trace how evolution of protein structure combines supersecondary structures and dipeptides into domains, developing complex translation machinery and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs. The dipeptides define functional aspects of cells from translational machinery to protein binding and thus can play an extremely significant role in determining the evolutionary history of primordial protein structures and associated processes. The bipartite networks of domains and genomes with dipeptides, and the projected networks of dipeptides therein, portray the long standing and conserved relationships among these symbiotic entities. Further investigation highlights structural features of evolutionary significance and describes how the structural constitution of domains have been defined by these dipeptides of archaic nature and have transformed the functional landscape of protein expression.
- Graduation Semester
- 2022-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Muhammad Fayez Aziz
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…