GitHub Copilot · Your AI pair programmer

Frequently asked questions

General

What is GitHub Copilot?

GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. It draws context from comments and code to suggest individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a generative pretrained language model created by OpenAI. It is available as an extension for Visual Studio Code, Visual Studio, Neovim, and the JetBrains suite of integrated development environments (IDEs).

What data has GitHub Copilot been trained on?

GitHub Copilot is powered by Codex, a generative pretrained AI model created by OpenAI. It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub.

Does GitHub Copilot write perfect code?

In a recent evaluation, we found that users accepted on average 26% of all completions shown by GitHub Copilot. We also found that on average more than 27% of developers’ code files were generated by GitHub Copilot, and in certain languages like Python that goes up to 40%. However, GitHub Copilot does not write perfect code. It is designed to generate the best code possible given the context it has access to, but it doesn’t test the code it suggests so the code may not always work, or even make sense. GitHub Copilot can only hold a very limited context, so it may not make use of helpful functions defined elsewhere in your project or even in the same file. And it may suggest old or deprecated uses of libraries and languages. When converting comments written in non-English to code, there may be performance disparities when compared to English. For suggested code, certain languages like Python, JavaScript, TypeScript, and Go might perform better compared to other programming languages.

Like any other code, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted. As the developer, you are always in charge.

Will GitHub Copilot help me write code for a new platform?

GitHub Copilot is trained on public code. When a new library, framework, or API is released, there is less public code available for the model to learn from. That reduces GitHub Copilot’s ability to provide suggestions for the new codebase. As more examples enter the public space, we integrate them into the training set and suggestion relevance improves. In the future, we will provide ways to highlight newer APIs and samples to raise their relevance in GitHub Copilot’s suggestions.

How does a customer get the most out of GitHub Copilot?

GitHub Copilot works best when you divide your code into small functions, use meaningful names for functions parameters, and write good docstrings and comments as you go. It also seems to do best when it’s helping you navigate unfamiliar libraries or frameworks.

How can a customer contribute?

By using GitHub Copilot and sharing your feedback in the feedback forum, you help to improve GitHub Copilot. Please also report incidents (e.g., offensive output, code vulnerabilities, apparent personal information in code generation) directly to copilot-safety@github.com so that we can improve our safeguards. GitHub takes safety and security very seriously and we are committed to continually improving.

Human oversight

Can GitHub Copilot introduce insecure code in its suggestions?

Public code may contain insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns. This is something we care a lot about at GitHub, and in recent years we’ve provided tools such as GitHub Actions, Dependabot, and CodeQL to open source projects to help improve code quality. Of course, you should always use GitHub Copilot together with good testing and code review practices and security tools, as well as your own judgment.

Does GitHub own the code generated by GitHub Copilot?

GitHub Copilot is a tool, like a compiler or a pen. GitHub does not own the suggestions GitHub Copilot provides to you. You are responsible for the code you write with GitHub Copilot’s help. We recommend that you carefully test, review, and vet the code before pushing it to production, as you would with any code you write that incorporates material you did not independently originate.

Does GitHub Copilot copy code from the training set?

GitHub Copilot’s suggestions are all generated through AI. GitHub Copilot generates new code in a probabilistic way, and the probability that they produce the same code as a snippet that occurred in training is low. The models do not contain a database of code, and they do not ‘look up’ snippets. Our latest internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set. Previous research showed that many of these cases happen when GitHub Copilot is unable to glean sufficient context from the code you are writing, or when there is a common, perhaps even universal, solution to the problem.

What can I do to reduce GitHub Copilot’s suggestion of code that matches public code?

We built a filter to help detect and suppress GitHub Copilot suggestions which contain code that matches public code on GitHub.

Copilot for Individual users have the choice to enable that filter during setup on their individual accounts. For Copilot for Business users, the Enterprise administrator controls how the filter is applied. They can control suggestions for all organizations or defer control to individual organization administrators. These organization administrators can turn the filter on or off during setup (assuming their Enterprise administrator has deferred control) for the users in their organization.

With the filter enabled, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches (ignoring whitespace) against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to you. In addition, we have announced that we are building a feature that will provide a reference for suggestions that resemble public code on GitHub so that you can make a more informed decision about whether and how to use that code, as well as explore and learn how that code is used in other projects.

Just like when you write any code that uses material you did not independently originate, you should take precautions to understand how it works and ensure its suitability. These include rigorous testing, IP scanning, and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

Other than the filter, what other measures can I take to assess code suggested by GitHub Copilot?

You should take the same precautions as you would with any code you write that uses material you did not independently originate, and should take precautions to ensure its suitability. These include rigorous testing, IP scanning, and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

Fairness and broader impact

Will GitHub Copilot work as well using languages other than English?

Given public sources are predominantly in English, GitHub Copilot will likely work less well in scenarios where natural language prompts provided by the developer are not in English and/or are grammatically incorrect. Therefore, non-English speakers might experience a lower quality of service.

Does GitHub Copilot support accessibility features?

We are conducting internal testing of GitHub Copilot’s ease of use by developers with disabilities and working to ensure that GitHub Copilot is accessible to all developers. Please feel free to share your feedback on GitHub Copilot accessibility in our feedback forum.

Does GitHub Copilot produce offensive outputs?

GitHub Copilot includes filters to block offensive language in the prompts and to avoid synthesizing suggestions in sensitive contexts. We continue to work on improving the filter system to more intelligently detect and remove offensive outputs. If you see offensive outputs, please report them directly to copilot-safety@github.com so that we can improve our safeguards. GitHub takes this challenge very seriously and we are committed to addressing it.

Privacy – Copilot for Business

What data does Copilot for Business collect?

GitHub Copilot relies on file content and additional data to work. It collects data to provide the service, some of which is then retained for further analysis and product improvements.

Copilot for Business collects data as described below:

User Engagement Data

When you use GitHub Copilot it will collect usage information about events generated when interacting with the IDE or editor. These events include user edit actions like completions accepted and dismissed, and error and general usage data to identify metrics like latency and features engagement. This information may include personal data, such as pseudonymous identifiers.

Code Snippets Data

GitHub Copilot transmits snippets of your code from your IDE to GitHub to provide Suggestions to you. Code snippets data is only transmitted in real-time to return Suggestions, and is discarded once a Suggestion is returned. Copilot for Business does not retain any Code Snippets Data.

How can users of Copilot for Business control use of their data?

User engagement data (which includes pseudonymous identifiers and general usage data), is required for the use of GitHub Copilot and will continue to be collected, processed, and shared with Microsoft and OpenAI when you use GitHub Copilot.

Copilot for Business does not retain any Code Snippets Data.

Privacy – Copilot for Individuals

What data does Copilot for Individuals collect?

GitHub Copilot relies on file content and additional data to work. It collects data to provide the service, some of which is then retained for further analysis and product improvements. GitHub Copilot collects the following data for individual users:

User Engagement Data

When you use GitHub Copilot it will collect usage information about events generated when interacting with the IDE or editor. These events include user edit actions like completions accepted and dismissed, and error and general usage data to identify metrics like latency and features engagement. This information may include personal data, such as pseudonymous identifiers.

Code Snippets Data

Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files path.

How is the transmitted Code Snippets data protected?

We know that user edit actions, source code snippets, and URLs of repositories and file paths are sensitive data. Consequently, several measures of protection are applied, including:

  • The transmitted data is encrypted in transit and at rest
  • Access is strictly controlled. The data can only be accessed by (1) named GitHub personnel working on the GitHub Copilot team or on the GitHub platform health team, (2) Microsoft personnel working on or with the GitHub Copilot team, and (3) OpenAI personnel who work on GitHub Copilot
  • Role-based access controls and multi-factor authentication are required for personnel accessing code snippet data

How can users of Copilot for Individuals control use of their Code Snippets Data?

GitHub Copilot gives you choices about how it uses the data it collects.
User Engagement Data (which includes pseudonymous identifiers and general usage data), is required for the use of GitHub Copilot and will continue to be collected, processed, and shared with Microsoft and OpenAI as you use GitHub Copilot.
Users of Copilot for Individuals can choose whether Code Snippets Data is retained by GitHub and further processed and shared with Microsoft and OpenAI by adjusting user settings.

Users of Copilot for Individuals can request deletion of Code Snippet Data associated with their GitHub identity. GitHub Copilot is powered by Codex, which is a learning model trained by OpenAI. To request removal of your code from Codex's training data, please follow the procedure outlined in the "Copyright Complaints" section in OpenAI's terms of use. For more information, please contact us by filling out a support ticket.

Does GitHub Copilot ever output personal data?

Because Codex, the model powering GitHub Copilot, was trained on publicly available code, its training set included public personal data that was included in that code. From our internal testing, we found it to be very rare that GitHub Copilot suggestions included personal data verbatim from the training set. In some cases, the model will suggest what appears to be personal data – email addresses, phone numbers, etc. – but those suggestions are actually fictitious information synthesized from patterns in training data and therefore do not relate to any particular individual. For example, when one of our engineers prompted GitHub Copilot with, “My name is Mona and my birthdate is,” GitHub Copilot suggested a random, fictitious date of “December 12,” which is not Mona’s actual birthdate. We have also implemented a filter that blocks emails when shown in standard formats, but it’s still possible to get the model to suggest this sort of content if you try hard enough. We will keep improving the filter system to be more intelligent to detect and remove more personal data from the GitHub Copilot suggestions.

Where can I learn more about GitHub Privacy and data protection?