Chuck Cranor   Research Projects

Technical Interests and Activities

Other Interests and Activities

Publications

PDL Visitor info + directions

[CHUCK'S PHOTO] Systems Scientist Faculty
Parallel Data Lab
Carnegie Mellon University
5000 Forbes Avenue
CIC Building, Room 2211
Pittsburgh, PA 15213
Email: chuck@ece.cmu.edu (preferred)
Phone: +1 412 268 5426
Fax: +1 412 268 3010
I am a Senior System Scientist faculty member at Carnegie Mellon University working in the Parallel Data Lab (PDL). I joined PDL in December, 2003. My research interests is in building new systems that are both novel and have potential to make a positive impact on the world. As a Systems Scientist, my role in PDL is to work with other faculty members to help architect, prototype, and deploy large scale storage systems projects based on PDL research. I also play a key role in defining the architecture of the PDL lab computing environment, including the data center we have built on the first floor of the CIC building. I am currently working with professors George Amvrosiadis and Greg Ganger on storage in High Performance Computing (HPC) systems. I also worked with Garth Gibson before he left CMU and returned to Canada to head the Vector Institute for Artificial Intelligence.

Prior to joining PDL, I was on the technical staff at AT&T Labs-Research for five years. At AT&T I worked on research projects relating to network monitoring, network content distribution, packet telephony, and "system on a chip" embedded devices. I have a doctorate in Computer Science from Washington University in St. Louis, Missouri. As part of my graduate research I wrote the UVM Virtual Memory system. UVM is in worldwide use as part of the kernel of the NetBSD and OpenBSD open-source operating systems projects.

Research Projects

DeltaFS
DeltaFS is a filesystem being developed as part of PDL's collaboration with Los Alamos National Labs (LANL). Unlike traditional filesystems, DeltaFS resides entirely within applications that link to it and uses the application's memory and CPU resources for I/O processing. DeltaFS does not have metadata servers or require a global filesystem namespace. This arrangement frees DeltaFS from synchronization overheads associated with distributed filesystems. It also allows DeltaFS to transform application filesystem calls to more efficient I/O requests and to perform in-situ analysis (e.g. indexing) on data written by the application through the filesystem API. In-situ processing offers a promising path to providing improved time-to-insight for many type of scientific inquiries, but scaling in-situ frameworks to hundreds of thousands of cores is difficult to achieve in practice.
PDLFS
PDLFS is a platform for software development. PDLFS has three main components: a common library that contains the base-level software environment we use to build our HPC research projects (e.g. DeltaFS), the build/test/run environment shared between all our projects, and an "umbrella" system that provides support for easily building and linking third party libraries with our software and installing it in one unified place.
Mochi
The Mochi Software Defined Storage (SDS) project for exascale storage services is a join project between CMU, Argonne National Labs (ANL), LANL, and the HDF Group funded by the US Department of Energy. In Mochi we are exploring how to build specialized, transient data services for data-intensive HPC applications. The services offer flexible, on-demand pairing of applications with storage and networking hardware using semantics that are optimized for the problem domain.
ATLAS
ATLAS is a project where we are collecting and analyzing system-level traces from various types of clusters. Many researchers evaluating cluster management designs today rely primarily on a trace released by Google due to a scarcity of other sufficiently diverse data sources. We have shown that overreliance on the Google trace workload is leading researchers to overfit their results. Our immediate goal with Atlas is to help create a diverse corpus of real workload traces for researchers. Long term, we plan to collect and host multi-layer cluster traces that combine data from several layers of subsystems (e.g. job scheduler and filesystem logs) to aid in the design of future vertically optimized systems

Technical Interests and Activities

Computer Operating Systems
I am interested in systems research in the areas of computer operating systems, storage systems, and networking. In particular, I am interested in the design of low-level operating system software that is used to program and control a computer's resources and the mechanisms that the kernel provides to user-level programs to access these resources. In order to fully understand software at this level, I believe it is important to be familiar not only with kernel software structure, but also with the register-level interface provided by a CPU and its peripherals. As computers become more powerful and complex it gets harder and harder to reason about application behavior. Understanding both a wide range of hardware interfaces and low-level software structures enables one to design low-level interfaces that are well optimized, clean, portable, and safely provide applications with powerful access mechanisms.

Professional Activities
I am a member of ACM, IEEE, and USENIX. I was on the program committees for the FREENIX track of the USENIX general conference in 2002 and 2003. I was also on the PC of the 2000 edition of the Computers, Freedom, and Privacy conference (CPF2000). I've reviewed papers for USENIX, FREENIX, NOSSDAV, IEEE INFOCOM, ACM SIGCOMM, and Supercomputer conferences. I also wrote and contributed a Web-based submission system that was used for a number of years by the Computers, Freedom, and Privacy conference and the Telecommunications Policy Research Conference (TPRC).

Open Source Operating Systems
If you really want to understand how computers work, I believe that having access to low-level operating system sources is critical. To promote such access, I've been developing software for the open source BSD systems since 1994. I have contributed software to FreeBSD, NetBSD, and OpenBSD. To learn more about my work with BSD, see my BSD page. I also hosted and helped create the first Anonymous CVS server on the Internet (the original anoncvs.openbsd.org, which was also know as eap.ccrc.wustl.edu).

Vintage Computing
I like to collect older computers that are capable of running modern versions of BSD Unix. Currently I have machines from the following architectures: i386, Sparc (sun4 and sun4m), m68k, MIPS, arm32, hppa, and VAX. I'm also generally interested in information about the various computer systems I've used over the years. My computing history page lists the major systems I've used and what I'm currently using now. The Simh simulator is a great option to for emulating older hardware that you no longer have access to. I've used that to install and run the old RSTS/E operating system that I used back in high school on a simulated PDP-11 and that's pretty cool!

Preferred Computing Environment
Operating system: BSD Unix, MacOSX, Linux
Programming language: C (low-level), perl (high-level), C++ (if kept simple/readable)
shell: /bin/tcsh (note: for scripting I use perl)
Source code control: git, cvs
Web-based source management: gogs (gitlab is bloatware, IMHO)
Windowing environment: XOrg+fvwm, MacOSX, I prefer light text on dark background
Editor: vi (for email and short term edits), emacs (for programming and writing papers)
Build tool: cmake
mail/news: mutt / trn
Web: firefox, with all the mozilla ad-blocking filters installed and Flash disabled
Documents: latex via emacs (+auctex macros), xfig for figures, xmgrace for graphs (but xmgrace has a terrible UI)
Presentations: powerpoint

Misc.
We've been using Emulab from University of Utah to manage some of our lab's computing clusters (we also use OpenStack). I've had a lot of experience installing, hacking, and debugging systems with Emulab. I also run alot of our experimental code on Cray HPC systems at LANL, so I have quite a bit of experience working with the Cray Programming environment and the Slurm batch job system. We build all our research projects using cmake, so I've spent a lot of time working with that as well and have contributed a number of Cray-related updates back to the cmake developers for the 3.14 and 3.15 releases of cmake.

Other Interests and Activities

Family
My wife, Lorrie Faith Cranor, also works at CMU and is a well known researcher in the area of usable privacy and security systems. In her spare time she likes to make quilts, take photos, play flute/saxophone, play soccer, hike, parent our kids, and catch up on her email backlog (the last one is a necessary evil!). Visit her home page for details.

Shane Zachary Cranor is our son. He was born in 2001. Shane likes music (both playing it and recording/producing it), photography, making videos, computers, frisbee, and eating gourmet food. Watch Shane's high school band "The Electric Army" perform their song "Heavy Dreamer" at a street festival here (Shane is the bass player), and check out Shane's soundcloud here.

Maya Quinn Cranor is our older daughter, born in 2003. Maya likes reading fiction, Dr. Who (both the old and new series), building robots with the Girls of Steel team, STEM, writing, Voltron (the animated series), editing her school's yearbook, playing both flute and soccer, hiking, mock trial, broadway musicals, and baking desserts.

Nina Veronica Cranor is our younger daughter, born in 2006. Nina likes to sing, play her electric guitar and electric bass, play role playing games (DND and Magic the Gathering), read scifi books, watch Dr. Who, play soccer, do STEM-related activities, and get dressed up and go out to a nice dinner. Nina is also a big fan of summer art camps of all types.

Construction/Engineering
While I am a computer engineer by trade, I'm generally interested in all sort of engineering and building projects. Working at a growing university located in a city with an aging infrastructure, we always seem to have some sort of interesting construction projects going on (new buildings, space renovations, new computer machine rooms, water main repairs, road upgrades, etc.). I like to keep tabs on all of that stuff!

My wife and I have also done many home renovations projects. When we lived in New Jersey we bought a new house that was still under construction. A few years later we had the basement of that house fully finished. In Pittsburgh, we have a house that is over 100 years old and we have done many upgrades including fun stuff like a full kitchen and full bathroom renovation, to less fun stuff like replacing "tube and knob" electric wiring with modern wiring and having the concrete floor of our basement dug up to replace decaying cast iron sewer pipes.

Music/Guitar
I like listening to and researching music (e.g. on youtube/wikipedia). When I was young and visiting my grandparents, I would get up very early in the morning to accompany one of my grandfathers to the radio station where he was the host of the early morning show. While he was on the air I was allowed to play around with the station's music library and production room (I made alot of mix tapes while at the radio station). This resulted in my love for the classic rock, pop, and funk/soul music of the 60s and 70s. Later on I embraced punk, new wave, and alternative rock. My grandfather also let me play with his PA, tape recorders, and guitar amplifiers at his house -- so I'm interested in audio equipment too.

When my kids started taking an interest in making music, I took up learning the electric guitar first using the internet (Justin Sandercoe's youtube channel is a great resource) and later I started taking lessons at our local music school (Sunburst School of Music in Squirrel Hill). The guitar is an endless source of fun and challenges and a great change of pace after working on computer stuff all day at work. I like to find interesting songs and reverse engineer them by transcribing the parts by ear (I use a program called "Transcribe!" to help with this). I especially love to perform some of the songs I've figured out and arranged with our friends+family band "Unanticipated Fun." We've done songs from a wide variety of artists (anything from The Monkees to Tool).

Building a Tube Amplifier
My undergraduate degree is in electrical engineering with a focus on digital systems, so I do not have a lot experience with old analog technology. But as I got into playing the guitar I developed an interest in how vacuum amplifiers worked. So in 2018 I decided to build a clone of the Fender Princeton Reverb Amplifier from scratch using a kit. Read all about my project here.
Dining
My favorite type of food is Mexican/Southwestern food, though there are not a huge number options here in Pittsburgh. It is also nice to go to one of the fancy Big Burrito restaurants (Casbah, Eleven, Soba, etc.). For fast-casual, I often go to Panera which is the national name for the St. Louis Bread Company (I ate at the bread co. alot while in grad school).

Iced Tea
I loved fresh brewed iced tea. Restaurants that do not have good tasting fresh brewed iced tea rate lowly as far as I'm concerned. Please note that Snapple or other bottled teas are not fresh brewed and are not valid substitutes for the real thing!

Publications

DeltaFS: A Scalable No-Ground-Truth Filesystem For Massively-Parallel Computing. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2021), November 2021 (with Q. Zheng, G. Ganger, G. Gibson, G. Amvrosiadis, B. Settlemyer, and G. Grider).

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories. ACM Transactions on Storage, Volume 16, No. 4, September 2020 (with Q. Zheng, A. Jain, G. Ganger, G. Gibson, G. Amvrosiadis, B. Settlemyer, and G. Grider).

Mochi: Composing Data Services for High-Performance Computing Environments. Journal of Computer Science and Technology, Volume 35, No. 1, January 2020 (with R. Ross, G. Amvrosiadis, P. Carns, M. Dorier, K. Harms, G. Ganger, G. Gibson, S. Gutierrez, R. Latham, B. Robey, D. Robinson, B. Settlemyer, G. Shipman, S. Snyder, J. Soumagne, and Q. Zheng).

Compact Filter Structures for Fast Data Partitioning. Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER), September 2019 (with Q. Zheng, A. Jain, G. Ganger, G. Gibson, G. Amvrosiadis, B Settlemyer, and G. Grider). An earlier version of this paper is in Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-19-104, June 2019.

This is Why ML-driven Cluster Scheduling Remains Widely Impractical. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-19-103, May 2019 (with M. Kuchnik, J. Park, E. Moore, N. DeBardeleben, and G Amvrosiadis).

The Atlas Cluster Trace Repository. USENIX login, Volume 43, No. 4, Winter 2018 (with G. Amvrosiadis, M. Kuchnik, J. Park, G. Ganger, E. Moore, and N. DeBardeleben). (USENIX web site)

Scaling Embedded In Situ Indexing with DeltaFS. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2018), November 2018 (with Q. Zheng, D. Guo, G. Ganger, G. Amvrosiadis, G. Gibson, G. Grider, and F. Gao).

Software-Defined Storage for Fast Trajectory Queries using a DeltaFS Indexed Massive Directory. Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISC), November 2017 (with Q. Zheng, G. Amvrosiadis, S. Kadekodi, G. Gibson, B. Settlemyer, G. Grider, and F. Gao).

Enabling NVM For Data-Intensive Scientific Services. Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW '16), November 2016 (with R. Ross, J. Jenkins, S. Snyder, S. Seo, P. Carns, S. Atchley, and J. Soumange).

Structuring PLFS for extensibility. Proceedings of the 8th Parallel Data Storage Workshop (PDSW '13), November 2013 (with M. Polte and G. Gibson).

HPC Computation on Hadoop Storage with PLFS. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-115, November 2012 (with M. Polte and G. Gibson).

Early Experiences on the Journey Towards Self-* Storage. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2006 (with M. Abd-El-Malek, W. Courtright, G. Ganger, J. Hendricks, A. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. Sambasivan, and S. Sinnamohideen).

Ursa Minor: Versatile Cluster-based Storage. Proceedings of the 4th USENIX Conference on File and Storage Technology (FAST '05), December 2005 (with M. Abd-El-Malek, W. Courtright, G. Ganger, J. Hendricks, A. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. Sambasivan, S. Sinnamohideen, J. Strunk, E. Thereska, M. Wachs, and J. Wylie).

Design and Implementation of a Distributed Content Management System, to appear in Proceedings of the 13th International Workshop on Network and Operating Systems Support for Digital Audio and Video, June 2003 (with R. Ethington, A. Sehgal, D. Shur, C. Sreenan and K. van der Merwe).

Gigascope: A Stream Database for Network Applications, SIGMOD (Industrial Track), June 2003 (with T. Johnson, O. Spatscheck, and V. Shkapenyuk).

A Precise and Efficient Evaluation of the Proximity Between Web Clients and Their Local DNS Servers, USENIX Annual Technical Conference, June 2002 (with Z. Mao, F. Douglis, M. Rabinovich, O. Spatscheck and J. Wang).

Gigascope: High Performance Network Monitoring with an SQL Interface, SIGMOD 2002, poster/demo and 1 page abstract, p. 623 (with Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck).

Characterizing Large DNS Traces Using Graphs, Proceedings of the ACM SIGCOMM Internet Measurement Workshop, November 2001 (with E. Gansner, B. Krishnamurthy, and O. Spatscheck).

PRISM Architecture: Supporting Enhanced Streaming Services in a Content Distribution Network, IEEE Internet Computing, pp. 66-75, July/August 2001 (with M. Green, C. Kalmanek, D. Shur, S. Sibal, C. Sreenan, and K. van der Merwe).

CDN Brokering, 6th International Workshop on Web Caching and Content Distribution, June 2001 also in Computer Communications 25 (2002) 393-402 (with A. Biliris, F. Douglis, M. Rabinovich, S. Sibal, O. Spatscheck and W. Sturm).

NED: a Network-Enabled Digital Video Recorder, Proceedings of the 11th IEEE Workshop on Local and Metropolitan Area Networks (LANMAN), May 2001 (with C. Kalmanek, D. Shur, S. Sibal, C. Sreenan, and K. van der Merwe).

PRISM, an IP-Based Architecture for Broadband Access to TV and Other Streaming Media, Proceedings of IEEE International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), June 2000 (with A. Basso, R. Gopalakrishnan, M. Green, C.R. Kalmanek, D. Shur, S. Sibal, C.J. Sreenan, J.E. van der Merwe).

Architectural Considerations for CPU and Network Interface Integration, IEEE Micro, pp. 18-26, January 2000 (with R. Gopalakrishnan and P. Onufryk). Initially presented at the Hot Interconnects Symposium 7, August 1999. (earlier Hot Interconnects version: pdf, slides in PDF)

Hardware and Software Architecture of a Packet Telephony Appliance, Proceedings of IEEE International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), pp. 207-216, June 1999 (with M. Chan, R. Gopalakrishnan, P.Z. Onufryk, L.W. Ruedisueli, C.J. Sreenan, and E.R. Wagner). (slides in PDF)

The UVM Virtual Memory System, USENIX Annual Technical Conference, pp. 117-130, June 1999 (with G. Parulkar).

Opening the Source Repository with Anonymous CVS, FREENIX Track of the USENIX Annual Technical Conference, pp. 129-138, June 1999 (with T. de Raadt).

Zero-Copy Data Movement Mechanisms for UVM, Washington University Department of Computer Science, Technical Report, December 1998.

Design and Implementation of the UVM Virtual Memory System, D.Sc. dissertation, Department of Computer Science, Sever Institute of Technology, Washington University, St. Louis, Missouri, August 1998.

Integrating ATM Networking into BSD, Washington University Department of Computer Science, Technical Report 98-??, December 1998.

Gigabit CORBA --- High-Performance Distributed Object Computing, Gigabit Networking Workshop (GBN'96), 24 March 1996, San Francisco, in conjunction with INFOCOM '96 (with D. Schmidt and G. Parulkar).

Half-Sync/Half-Async: An Architectural Pattern for Efficient and Well-structured Concurrent I/O, in Pattern Languages of Program Design, (Coplien, Vlissides, and Kerth, eds.), Addison-Wesley, Reading, MA, 1996 (with D. Schmidt).

Design of Universal Continuous Media I/O, in Proceedings of the 5th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV '95), pp 83-86, April 1995 (with G. Parulkar).

Operating Freely, Crossroads, the International ACM Student Magazine, Vol. 1, Issue 3, February 1995.

Universal Continuous Media I/O: Design and Implementation, Washington University Department of Computer Science, Technical Report 94-34, December 1994 (with G. Parulkar).

The 3M Project: Multipoint Multimedia Applications on Multiprocessor Workstations and Servers, in Proceedings of IEEE Workshop on High Performance Communication Systems, Sept. 1993 (with M. Buddhikot, Z. Dittia, G. Parulkar, C. Papadopoulos).

An Implementation Model for Connection-Oriented Internet Protocols, in Journal of Internetworking: Research and Experience, Vol. 4., pp 133-157, Sept. 1993 (with G. Parulkar).

An Implementation Model for Connection-Oriented Internet Protocols, Proceedings of IEEE INFOCOM '93, Vol. 3, pp 1135-1143, April 1993 (with G. Parulkar).

An Implementation Model for Connection-Oriented Internet Protocols, M.S. thesis, Department of Computer Science, Sever Institute of Technology, Washington University, St. Louis, Missouri, May 1992.

Connection-oriented Internet Protocols, Invited talk, The Twenty-Fifth Internet Engineering Task Force (IETF) meeting, Washington DC, November, 1992.