Roadmap and Improvement Proposals¶
The project introduction explains the overview and goals of DataFusion, and our development efforts largely align to that vision.
Planning EPIC
s¶
DataFusion uses GitHub issues to track planned work. We collect related
tickets using tracking issues marked with the EPIC
label, containing
discussion and links to more detailed items:
Epics offer a high level roadmap of what the DataFusion community is thinking about. The epics are not meant to restrict possibilities, but rather help organize the community and make it easier to see where development is headed, align our work, and inspire additional contributions.
We also welcome contributions for items not covered by epics. However, before submitting a large PR, we strongly suggest and request you start a conversation as described in Discussing New Features below.
Quarterly Roadmap¶
The DataFusion roadmap is driven by the priorities of contributors rather than any single organization or coordinating committee. We typically discuss our roadmap using GitHub issues, approximately quarterly, and invite you to join the discussion.
For more information:
Improvement Proposals¶
Discussing New Features¶
If you plan to work on a new feature that doesn’t have an existing ticket, it is a good idea to open one for discussion. Advanced discussion helps avoid wasted effort by determining if the feature is a good fit for DataFusion before too much time is invested. Discussion on a ticket can help gather feedback from the community and is likely easier to discuss than a 1000 line PR.
Maintainers will mark major proposals as PROPOSED EPIC
to make them more
visible, but we are very limited on review bandwidth. If you open a ticket and it
doesn’t get any response, try @
-mentioning recently active community members
in the ticket, or posting to the mailing list or Discord.
Supervising Maintainers¶
We have found that most successful epics have one or more “supervising maintainers”, a committer (see here for current list) who take the lead on reviewing and committing PRs, helps with design, and coordinates and communicates with the community. If you want to ship a large feature, we recommend finding such maintainer upfront; otherwise, your PRs may remain unreviewed for a very long time.
Supervising maintainers have no additional formal authority and there is currently no formal process for appointing, approving or tracking who has that role for a given epic. Instead, we rely on discussion on the ticket or PR. Helping complete an epic is a significant time commitment, so maintainers are more likely to help features they are particularly interested in or align with their own project’s use of DataFusion.
If you are willing to be a supervising maintainer for a feature, please say so explicitly. If you are unsure, we suggest asking directly who is willing to take the role, as it can be hard to tell sometimes whether a committer is simply participating and giving general feedback.
What Contributions are Good Fits?¶
DataFusion is designed to be highly extensible, and many features can be implemented as extensions without changes or additions to the core. Support for new functions, data formats, and similar functionality can be added using those extension APIs, and there are already many existing community supported extensions listed in the extensions list.
Query engines are complex pieces of software to develop and maintain. Given our limited maintenance bandwidth, we try to keep the DataFusion core as simple and focused as possible, while still satisfying the design goal of an easy to start initial experience.
With that in mind, contributions that meet the following criteria are more likely to be accepted:
Bug fixes for existing features
Test coverage for existing features
Documentation improvements / examples
Performance improvements to existing features (with benchmarks)
“Small” functional improvements to existing features (if they don’t change existing behavior)
Additional APIs for extending DataFusion’s capabilities
CI improvements
Contributions that will likely involve more discussion (see Discussing New Features above) prior to acceptance include:
Major new functionality (even if it is part of the “standard SQL”)
New functions, especially if they aren’t part of “standard SQL”
New data sources (e.g. support for Apache ORC)
Design Build vs. Big Up Front Design¶
Typically, the DataFusion community attacks large problems by solving them bit
by bit and refining a solution iteratively on the main
branch as a series of
Pull Requests. This is different from projects which front-load the effort
with a more comprehensive design process.
By “advancing the front” the community always makes tangible progress, and the strategy is especially effective in a project that relies on individual contributors who may not have the time or resources to invest in a large upfront design effort. However, this “bit by bit approach” doesn’t always succeed, and sometimes we get stuck or go down the wrong path and then change directions.
Our process necessarily results in imperfect solutions being the “state of the code” in some cases, and larger visions are not yet fully realized. However, the community is good at driving things to completion in the long run. If you see something that needs improvement or an area that is not yet fully realized, please consider submitting an issue or PR to improve it. We are always looking for more contributions.