Pose question
“We must think of data as a flowing river over time, not a static
snapshot. Make copies, share, and do magic” – S. Madhavan
FS Stack
HBA/HCA
LAN
Switch
Router
TCP
IP
NIC
Storage Array
Wide Area Network
OST
MDT
Lustre file system
Destination
data transfer node
OSS
OSS
MDS
MDS
+ diverse environments
+ diverse workloads
+ contention
File transfer is an end-to-end problem
Globus GridFTP provides a widely-used open source implementation.
Modular, pluggable architecture (different protocols, I/O interfaces).
Many optimizations: e.g., concurrency, parallelism, pipelining.
Concurrency over 24 hours. Kettimuthu et al., 2015
Throughput vs. concurency & parallelism. Kettimuthu et al., 2014
A load-aware, adaptive algorithm:
(2) Concurrency-constrained scheduling
Advanced Scientific Computing Research
Program manager: Rich Carlson
♦︎
Simple analytical model:
T= α+ β*l
[startup cost + sustained bandwidth]
Experiment + regression to estimate α, β
First-principles modeling to better capture details of system & application components
Data-driven modeling to learn unknown details of system & application components
Model composition
Model, data comparison
simulated/emulated measurements
point regression estimate
Configuration for
host and edge devices
composition
operations
End-to-end profile composition
… but in reality it’s often very challenging
Other services
Time
Time
Automate and outsource:
the
Discovery cloud
Globus research data management services
www.globus.org
Simulation
Data
Source
Data
Destination
go#s3
25,000 users, 75 PB and 3B files transferred, 8,000 endpoints
Globus endpoints
Globus Toolkit
Globus Connect
X
Globus Toolkit
Globus Connect
Publication and discovery
X
Globus APIs
Globus Connect
Publication and discovery
X
Sample
Experimental
scattering
Material composition
Simulated structure
Simulated
scattering
La 60%
Sr 40%
Detect errors (secs—mins)
Knowledge base
Past experiments; simulations; literature; expert knowledge
Select experiments (mins—hours)
Contribute to knowledge base
Simulations driven by experiments (mins—days)
Knowledge-driven
decision making
Evolutionary optimization
Integrate statistics/machine learning to assess many models and calibrate them against `all' relevant data
New computer facilities enable on-demand computing and high-speed analysis of large quantities of data
Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:
Email: Нажмите что бы посмотреть