For decades, traditional geoscientific models were not learnable and evolved slowly given observational data. Deep learning (DL) models, which can evolve much more rapidly, have tremendously elevated our predictive capability, but they remain challenging to interpret and do not allow us to ask precise science questions. These two domains are often perceived as distinct paradigms, but we argue that there is no fundamental difference between them except for differentiable programming, which critically supports learning complex functions via neural networks (NNs). Here we propose the concept, scope and goals of differentiable modeling in geosciences, or simply differentiable geosciences (DG), which intermingle NNs with physical process descriptions and a varying degree of structural prior. The DG paradigm peels off the architectural elements of DL while marrying its core elements to geoscientific process descriptions. This allows it to retain the desirable ability of deep networks to adapt to and absorb from big data, but can output untrained physical variables and provide a full narrative to stakeholders. With physics as the connective tissues and structural constraints, we can now flexibly and precisely place our question mark anywhere in the modeling system, such as learning parameterization schemes, governing equations, constitutive relationships, structural configurations or improved assumptions. DG relaxes the demand on data quantity and permits flexible imposition of structural priors. Evidence shows that differentiable models can provide state-of-the-art modeling accuracy, approaching that of the purely data-driven machine learning models. This is crucial to ascertain that true knowledge is learned. Differentiable models are best positioned to benefit from the synergistic effects of big data, provide a full physically-based narrative of the event while learning more robust relationships than purely data-driven models.