SOGI stands for Sexual Orientation and Gender Identity.

This is the term that researchers gave to the collection of demographic data around various aspects of an individual’s sexual identity. You may also see the following acronyms: SSOGI, SOGIE, SOGISC, SCSOGI, and other variations. The extra S or SC stand for sex characteristics in order to indicate inclusion of Intersex individuals. The E would stand for expression from the legal language “gender identity and expression”.

These questions cover all SOGI identities including straight and cisgender. They’re meant to be asked to everyone(whole population), whether they are assumed to have a cis-hetero identity or not. When SOGI identity questions are asked to a LGBTQIA+ specific audience, they are likely to be more questions and more answer options.

I’ll try to keep this as short as possible, while providing an introductory level overview on the most common issues that come up. This blog post will be updated over time. You can skip the sentences in italics if you’re just skimming, they are there to provide an extra level of detail.

Disclaimer: I write this as an individual and it has no relation to any organization I may have a past, current, or future affiliation with. It’s my informed opinion as a queer person, as a data professional, and as someone who’s been working on SOGI data since January 2021, but it’s my opinion.


The intended purpose of collecting SOGI data is to identify and address disparities in order to improve equity. By being able to divide a group into sub-populations for data analysis we can learn about the differences between sub-populations. That leads to knowledge of positives and negatives, benefits and detractors. Once known, positives can be shared to other sub-populations and negative items in sub-population(s) can be targeted and addressed.

There’s an interesting divide between SOGI questions in medicine and SOGI questions anywhere else besides medicine. Most efforts to collect this data have come out of the medical arena where ethics, privacy, laws, and regulations provide layers of protection so most individuals can be honest about their identity most of the time. Outside of medical providers those protections do not exist and therefore these contexts must be treated differently.

I’ll also call out that SOGI questions could be done under diversity initiatives or under quality initiatives. In healthcare, you can not improve quality of care or patient outcomes without knowing and applying SOGI identity specific practices. Assume it’s the same everywhere else.

SOGI questions are, by design and definition, about putting people into categories(boxes) so that differences between various identities can be studied.

As a general person (in the United States) you’re likely to be asked these questions at the same time you’re asked about your race and ethnicity.

More and more geographic locations in the United States and more and more industries and organizations are either receiving mandates to collect SOGI data, or are realizing it’s importance to their work and updating their questions pro-actively.

Basics of an Individual’s SOGI Identity

Fundamentals of Collecting SOGI Data

  • Having this data for an individual is inherently risky to their personal safety. Whether now, or in the future as laws, judicial decisions, and political administrations change, this data can be used to harm. Treat this data with respect, caution, additional safeguards, and the utmost safety.
  • The individual is the one that knows their identity. Do not infer it, ask someone else for it, or in any way choose it for them. There is the potential that “proxy reporting”(asking someone else for a person’s identity) may be viable and appropriate in some circumstances in the future, but it goes against LGBTQIA+ community values(whether that person is fully publicly out or not) and there’s no research on the range of validity that data may or may not have.
  • Always first do no harm, even if that means not collecting data or not being able to dis-aggregate data among other variables.
  • Groups at higher risk of harm from people with power over them(ex. minors, prisoners, minors in-care with a DCFS organization, any group an IRB board would automatically do extra review on before approving research) must have extra scrutiny and review of the who/what/where/when/why/how of SOGI questions they are asked.
  • Everyone wants to know “the question” to ask. They are looking for the wording of the question(s) and the value(s) to provide to answer it. All the best research and literature says that there is no one question. The best question(s) are ones specific to your circumstances.
  • Anyone in the US can have an X sex/gender marker on legal documents. Anyone in the US that is eligible for a US Passport(e.g. over 16 years old, legal status, can pay the fee, etc.) can have an X gender marker. Individuals in specific states can have an X gender marker on their birth certificate, state ID, driver’s license, etc.
  • You have to be able to have your data interact with allllllll the other systems your data comes from and goes to. Plus, those systems are going to change over time at different times. Classic technology change management issues.
    • Legal Sex (or Gender) on legal documents will be M/F/X. If it’s M/F in a system now, it will eventually change to M/F/X. I see no circumstances where state or United States national legal identity documents will have more than M/F/X in the next 10 years(pre-2033). Emphasis on the word Legal in the phrase Legal Sex. If you don’t need Legal Sex, don’t collect it.
  • The biggest push back you’ll get from starting to ask SOGI questions is from the people in your organization that will ask the question(s). The numbers and experience show that the askers probably won’t get push back, or the push back they fear, from the individuals they are asking SOGI questions to. However, this reticence to collect data among the people responsible for collecting it means planned roll-out, training, and support for your personnel are key.

Info Related to Sexual Orientation(SO) Questions

Minors(Ages 0-18) and Sexual Orientation Questions

There’s a consensus that a person shouldn’t be asked their sexual orientation until they are 13 years old or more. That would mean there is no sexual orientation data for anyone under 13. Related to US schools, because a parent has a right to see a student record at any point, sexual orientation should only be asked in anonymous surveys. For non-school data, also take this into consideration before asking non-anonymized SO questions, especially if it could be seen by people with power over the individual.

Info Related to Gender Identity(GI) Questions

Minors(Ages 0-18) and Gender Identity Questions

From what I’ve heard, people seem to think 6 years of age and up is appropriate for starting to ask an individual questions related to their gender identity. Obviously, the questions still have to be age appropriate. Though there’s disagreement on that and some say 4 years old and up. Do whatever is affirming for the individuals and groups in your community. You could use someone’s correct pronouns at 4 but not collect SOGI data officially until they’re 6, etc.

Changing someone’s answers to these questions

Changes to someone’s SOGI identification, whether in how they want to be referred to or how they want to be represented in official documents, should be done under self-attestation. That is, don’t require any medical documents or ID documents or letters to make the change in your system. If someone says they are an apple, then they are an apple.

Avoiding Offending Anyone

Any question you ask will offend someone, but here are some “usual” mistakes that people make that are both super offensive and easy to avoid.

  1. Do not directly marginalize or ‘other’ folks by using the actual word “Other” as an option value. “Another” is also pretty close to “other” and I recommend against it.
    1. This example is bad for many reasons, but it is a simplified example of an explicit use of the term “Other”.
      1. Straight
      2. Gay
      3. Lesbian
      4. Bisexual
      5. Other (with or without fill-in the blank)
    2. Here are some alternatives to help you get the idea.
      1. Adding options of: Don’t Know and/or Prefer Not to Answer and/or Not listed here (with or without fill in the blank)
      2. some combination of a, b, and c above
      3. My word is missing (with or without fill-in the blank)
  2. Transgender is not a gender.
    1. Do not ask:
      1. Male
      2. Female
      3. Transgender (and/or non-binary) (and/or other)
  3. A transgender woman is a woman. People, organizations, and forms that do the following are my pet peeve. At the moment research and people paid to do this professionally really wouldn’t agree with me because there isn’t a lot of research on it yet, but I feel pretty strongly about it.
    1. I ask you not to do this:
      1. Man/boy
      2. Woman/girl
      3. Transgender Man/boy
      4. Transgender Woman/girl
    2. Alternatives include:
      1. Separating gender identity(man, woman, non-binary, etc.) and gender modality(trans, cis) into separate questions. Gender modality is also sometimes called transgender status.
      2. Not asking trans status. This would be not good, but technically viable.
      3. TBD/Others to be found
  4. Non-binary people exist. Whether you think i)non-binary is an umbrella term for non-man and non-woman genders or ii) think non-binary is a term all it’s own in addition to other genders such as agender, genderfluid, etc. you should never have less than 3 value options for gender.
    1. Very, very bad:
      1. Man/boy
      2. Woman/girl
    2. Minimum viable:
      1. Man/boy
      2. Woman/girl
      3. Non-binary
  5. Controversial: Use of the word queer as either a sexual orientation term or a SOGI term. Generally, since some in the LGBTQIA+ community are offended by this word – most specifically older adults and our elders – the advice would be not to include the word queer. The queer communities’ reactions to the word queer are changing over time to find it less offensive, or at least understood it’s not meant to be offensive in certain contexts. On the other hand, some groups of youth are offended if you do not include queer. It can be a very important word for them. Maybe they don’t want to be the only one in a Gay Straight Alliance(GSA) that uses that specific term. Maybe they’re still questioning their identity. Maybe they’re trans but they only feel safe enough for people to assume they’re gay. I’m sure there are many other reasons. There are also geographic differences in feelings towards this word. My personal opinion is to include it unless you’re asking a group that’s specifically only ~50+ years old, but I’ve also loved the queerness of the word queer for the last 20+ years, so take that with a grain of salt.

Items I know I still need to cover:

Sample sizes (n-size).

Cultural sensitivity.

No, seriously, what question(s) do I put on my form

How often to update your questions.

  • If you won’t use a piece of data and you don’t have a mandate to collect it, than it’s not ethical to collect it.
  • Would knowing someone’s pronouns be enough? In the general public sphere this is the only thing you need to know.
  • Only get into anatomy if you have valid medical reasons for knowing.
  • Whether you’re asking verbally face-to-face, on paper, or electronically matters.
SOGI Data Collection 101
Tagged on: