To me this seems like poor design just like locales for languages are. As the computing devices a program runs on is dynamic it shouldn't me modeled as global state. Five tasks are run and should be scheduled on N cpus and M gpus.
Chapel was designed for the high performance computing community where programmers often want full control over mapping their computations to their hardware resources without needing to rely on techniques like virtualization or runtime load balancing, which can obscure key details. That said, higher-level abstractions can be (and have been) written in Chapel to insulate many computations from these system-level details, such as distributed arrays and iterators. Users of these higher-level features need not worry about the details of the underlying locales. We refer to this as Chapel's support for multiresolution programming.
That said, other communities may obviously prefer different approaches due to differing needs and constraints.
What you primarily want in HPC is control over where your data is stored. That is is subtly different from where your computations are performed. E.g an HPC computation may use N heterogeneous devices and require fine-grained control over how data is communicated between those devices. The examples with "locales" are too blunt to handle such scenarios.
We agree that the placement of data is important for HPC programmers to control. Locales are the means of controlling such placement in Chapel, whether directly (as in this article’s simple examples) of via abstractions like distributed arrays (whose implementations rely on locales).
Once the data is created, computations can be executed with affinity to a specific variable in a data-driven manner using patterns like `on myVar do foo(myVar, anotherVar)`. Alternatively, an abstraction can abstract such details away from a user's concern and control the affinity within its implementation, as the parallel iterator implementing `forall elem in MyDistributedArray` does.
According to the article, locales control where the code is running, not where the data is stored. Maybe that is implied in some cases such that if you create data in one locale that is also where it is stored, but it tells you nothing about how data created in one locale and accessed in another locale is handled (or even if that's allowed). As you mention other Chapel features that I don't know about they may fill in the gaps. My only point of contention is that the locale feature is poorly thought out and not a good way to address HPC needs.
Locales do control where the data is stored. For example:
var HostArr: [1..10] int; // allocated on the host memory
on here.gpus[0] {
// now we are on a GPU sublocale...
var DevArr:[1..10] int; // allocated on the device memory
...
}
In the near term, we are planning to publish our 2nd GPU blog post where we will discuss how to move data between device and host.