We model the base compositional
structure of the human and Escherichia coli genomes. Three particular properties
are first quantified: (1) There is a significant tendency for any region
of either genome to have a strand-symmetric base composition. (2) The variation
in base composition from region to region, within each genome, is very
much larger than expected from common homogeneous stochastic models. (3)
A given local base composition tends to persist over a scale of at least
kilobases (E. coli) or tens of kilobases (human). Multidomain stochastic
models from the literature are reviewed and sharpened. In particular, quantitative
measurements of the third property lead us to suggest a significant shift
in the style of domain models, in which the variation of A+T content with
position is modeled by a random walk with frequent small steps rather than
with large quantum jumps. As an application, we suggest a way to reduce
the amount of computation in the assembly of large sequences from sequences
of randomly chosen fragments.