/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* vim: set ts=8 sts=2 et sw=2 tw=80: */ /* This Source Code Form is subject to the terms of the Mozilla Public * License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
// PHC is a probabilistic heap checker. A tiny fraction of randomly chosen heap // allocations are subject to some expensive checking via the use of OS page // access protection. A failed check triggers a crash, whereupon useful // information about the failure is put into the crash report. The cost and // coverage for each user is minimal, but spread over the entire user base the // coverage becomes significant. // // The idea comes from Chromium, where it is called GWP-ASAN. (Firefox uses PHC // as the name because GWP-ASAN is long, awkward, and doesn't have any // particular meaning.) // // In the current implementation up to 64 allocations per process can become // PHC allocations. These allocations must be page-sized or smaller. Each PHC // allocation gets its own page, and when the allocation is freed its page is // marked inaccessible until the page is reused for another allocation. This // means that a use-after-free defect (which includes double-frees) will be // caught if the use occurs before the page is reused for another allocation. // The crash report will contain stack traces for the allocation site, the free // site, and the use-after-free site, which is often enough to diagnose the // defect. // // Also, each PHC allocation is followed by a guard page. The PHC allocation is // positioned so that its end abuts the guard page (or as close as possible, // given alignment constraints). This means that a bounds violation at the end // of the allocation (overflow) will be caught. The crash report will contain // stack traces for the allocation site and the bounds violation use site, // which is often enough to diagnose the defect. // // (A bounds violation at the start of the allocation (underflow) will not be // caught, unless it is sufficiently large to hit the preceding allocation's // guard page, which is not that likely. It would be possible to look more // assiduously for underflow by randomly placing some allocations at the end of // the page and some at the start of the page, and GWP-ASAN does this. PHC does // not, however, because overflow is likely to be much more common than // underflow in practice.) // // We use a simple heuristic to categorize a guard page access as overflow or // underflow: if the address falls in the lower half of the guard page, we // assume it is overflow, otherwise we assume it is underflow. More // sophisticated heuristics are possible, but this one is very simple, and it is // likely that most overflows/underflows in practice are very close to the page // boundary. // // The design space for the randomization strategy is large. The current // implementation has a large random delay before it starts operating, and a // small random delay between each PHC allocation attempt. Each freed PHC // allocation is quarantined for a medium random delay before being reused, in // order to increase the chance of catching UAFs. // // The basic cost of PHC's operation is as follows. // // - The physical memory cost is 64 pages plus some metadata (including stack // traces) for each page. This amounts to 256 KiB per process on // architectures with 4 KiB pages and 1024 KiB on macOS/AArch64 which uses // 16 KiB pages. // // - The virtual memory cost is the physical memory cost plus the guard pages: // another 64 pages. This amounts to another 256 KiB per process on // architectures with 4 KiB pages and 1024 KiB on macOS/AArch64 which uses // 16 KiB pages. PHC is currently only enabled on 64-bit platforms so the // impact of the virtual memory usage is negligible. // // - Every allocation requires a size check and a decrement-and-check of an // atomic counter. When the counter reaches zero a PHC allocation can occur, // which involves marking a page as accessible and getting a stack trace for // the allocation site. Otherwise, mozjemalloc performs the allocation. // // - Every deallocation requires a range check on the pointer to see if it // involves a PHC allocation. (The choice to only do PHC allocations that are // a page or smaller enables this range check, because the 64 pages are // contiguous. Allowing larger allocations would make this more complicated, // and we definitely don't want something as slow as a hash table lookup on // every deallocation.) PHC deallocations involve marking a page as // inaccessible and getting a stack trace for the deallocation site. // // Note that calls to realloc(), free(), and malloc_usable_size() will // immediately crash if the given pointer falls within a page allocation's // page, but does not point to the start of the allocation itself. // // void* p = malloc(64); // free(p + 1); // p+1 doesn't point to the allocation start; crash // // Such crashes will not have the PHC fields in the crash report. // // PHC-specific tests can be run with the following commands: // - gtests: `./mach gtest '*PHC*'` // - xpcshell-tests: `./mach test toolkit/crashreporter/test/unit` // - This runs some non-PHC tests as well.
#include"PHC.h"
#include <stdlib.h> #include <time.h>
#include <algorithm>
#ifdef XP_WIN # include <process.h> #else # include <sys/mman.h> # include <sys/types.h> # include <pthread.h> # include <unistd.h> #endif
#ifdef ANDROID // Android doesn't have pthread_atfork defined in pthread.h. extern"C" MOZ_EXPORT int pthread_atfork(void (*)(void), void (*)(void), void (*)(void)); #endif
// This class provides infallible operations for the small number of heap // allocations that PHC does for itself. It would be nice if we could use the // InfallibleAllocPolicy from mozalloc, but PHC cannot use mozalloc. class InfallibleAllocPolicy { public: staticvoid AbortOnFailure(constvoid* aP) { if (!aP) {
MOZ_CRASH("PHC failed to allocate");
}
}
// WARNING WARNING WARNING: this function must only be called when PHC::mMutex // is *not* locked, otherwise we might get deadlocks. // // How? On Windows, MozStackWalk() can lock a mutex, M, from the shared library // loader. Another thread might call malloc() while holding M locked (when // loading a shared library) and try to lock PHC::mMutex, causing a deadlock. // So PHC::mMutex can't be locked during the call to MozStackWalk(). (For // details, see https://bugzilla.mozilla.org/show_bug.cgi?id=374829#c8. On // Linux, something similar can happen; see bug 824340. So we just disallow it // on all platforms.) // // In DMD, to avoid this problem we temporarily unlock the equivalent mutex for // the MozStackWalk() call. But that's grotty, and things are a bit different // here, so we just require that stack traces be obtained before locking // PHC::mMutex. // // Unfortunately, there is no reliable way at compile-time or run-time to ensure // this pre-condition. Hence this large comment. // void StackTrace::Fill() {
mLength = 0;
// These ifdefs should be kept in sync with the conditions in // phc_implies_frame_pointers in build/moz.configure/memory.configure #ifdefined(XP_WIN) && defined(_M_IX86) // This avoids MozStackWalk(), which causes unusably slow startup on Win32 // when it is called during static initialization (see bug 1241684). // // This code is cribbed from the Gecko Profiler, which also uses // FramePointerStackWalk() on Win32: Registers::SyncPopulate() for the // frame pointer, and GetStackTop() for the stack end.
CONTEXT context;
RtlCaptureContext(&context); void** fp = reinterpret_cast<void**>(context.Ebp);
PNT_TIB pTib = reinterpret_cast<PNT_TIB>(NtCurrentTeb()); void* stackEnd = static_cast<void*>(pTib->StackBase);
FramePointerStackWalk(StackWalkCallback, kMaxFrames, this, fp, stackEnd); #elifdefined(XP_DARWIN) // This avoids MozStackWalk(), which has become unusably slow on Mac due to // changes in libunwind. // // This code is cribbed from the Gecko Profiler, which also uses // FramePointerStackWalk() on Mac: Registers::SyncPopulate() for the frame // pointer, and GetStackTop() for the stack end. # pragma GCC diagnostic push # pragma GCC diagnostic ignored "-Wframe-address" void** fp = reinterpret_cast<void**>(__builtin_frame_address(1)); # pragma GCC diagnostic pop void* stackEnd = pthread_get_stackaddr_np(pthread_self());
FramePointerStackWalk(StackWalkCallback, kMaxFrames, this, fp, stackEnd); #else
MozStackWalk(StackWalkCallback, nullptr, kMaxFrames, this); #endif
}
//--------------------------------------------------------------------------- // Global state //---------------------------------------------------------------------------
// Throughout this entire file time is measured as the number of sub-page // allocations performed (by PHC and mozjemalloc combined). `Time` is 64-bit // because we could have more than 2**32 allocations in a long-running session. // `Delay` is 32-bit because the delays used within PHC are always much smaller // than 2**32. Delay must be unsigned so that IsPowerOfTwo() can work on some // Delay values. using Time = uint64_t; // A moment in time. using Delay = uint32_t; // A time duration. static constexpr Delay DELAY_MAX = UINT32_MAX / 2;
// PHC only runs if the page size is 4 KiB; anything more is uncommon and would // use too much memory. So we hardwire this size for all platforms but macOS // on ARM processors. For the latter we make an exception because the minimum // page size supported is 16KiB so there's no way to go below that. staticconst size_t kPageSize = #ifdefined(XP_DARWIN) && defined(__aarch64__)
16384 #else
4096 #endif
;
// We align the PHC area to a multiple of the jemalloc and JS GC chunk size // (both use 1MB aligned chunks) so that their address computations don't lead // from non-PHC memory into PHC memory causing misleading PHC stacks to be // attached to a crash report. staticconst size_t kPhcAlign = 1024 * 1024;
// There are two kinds of page. // - Allocation pages, from which allocations are made. // - Guard pages, which are never touched by PHC. // // These page kinds are interleaved; each allocation page has a guard page on // either side. #ifdef EARLY_BETA_OR_EARLIER staticconst size_t kNumAllocPages = kPageSize == 4096 ? 4096 : 1024; #else // This will use between 82KiB and 1.1MiB per process (depending on how many // objects are currently allocated). We will tune this in the future. staticconst size_t kNumAllocPages = kPageSize == 4096 ? 256 : 64; #endif staticconst size_t kNumAllPages = kNumAllocPages * 2 + 1;
// The total size of the allocation pages and guard pages. staticconst size_t kAllPagesSize = kNumAllPages * kPageSize;
// jemalloc adds a guard page to the end of our allocation, see the comment in // AllocAllPages() for more information. staticconst size_t kAllPagesJemallocSize = kAllPagesSize - kPageSize;
// The amount to decrement from the shared allocation delay each time a thread's // local allocation delay reaches zero. staticconst Delay kDelayDecrementAmount = 256;
// When PHC is disabled on the current thread wait this many allocations before // accessing sAllocDelay once more. staticconst Delay kDelayBackoffAmount = 64;
// When PHC is disabled globally reset the shared delay by this many allocations // to keep code running on the fast path. staticconst Delay kDelayResetWhenDisabled = 64 * 1024;
// The default state for PHC. Either Enabled or OnlyFree. #define DEFAULT_STATE mozilla::phc::OnlyFree
// The maximum time. staticconst Time kMaxTime = ~(Time(0));
// Truncate aRnd to the range (1 .. aAvgDelay*2). If aRnd is random, this // results in an average value of aAvgDelay + 0.5, which is close enough to // aAvgDelay. aAvgDelay must be a power-of-two for speed.
constexpr Delay Rnd64ToDelay(Delay aAvgDelay, uint64_t aRnd) {
MOZ_ASSERT(IsPowerOfTwo(aAvgDelay), "must be a power of two");
static Delay CheckProbability(int64_t aProb) { // Limit delays calculated from prefs to 0x80000000, this is the largest // power-of-two that fits in a Delay since it is a uint32_t. // The minimum is 2 that way not every allocation goes straight to PHC. return RoundUpPow2(std::clamp(aProb, int64_t(2), int64_t(0x80000000)));
}
// Maps a pointer to a PHC-specific structure: // - Nothing // - A guard page (it is unspecified which one) // - An allocation page (with an index < kNumAllocPages) // // The standard way of handling a PtrKind is to check IsNothing(), and if that // fails, to check IsGuardPage(), and if that fails, to call AllocPage(). class PtrKind { private: enumclass Tag : uint8_t {
Nothing,
GuardPage,
AllocPage,
};
Tag mTag;
uintptr_t mIndex; // Only used if mTag == Tag::AllocPage.
public: // Detect what a pointer points to. This constructor must be fast because it // is called for every call to free(), realloc(), malloc_usable_size(), and // jemalloc_ptr_info().
PtrKind(constvoid* aPtr, const uint8_t* aPagesStart, const uint8_t* aPagesLimit) { if (!(aPagesStart <= aPtr && aPtr < aPagesLimit)) {
mTag = Tag::Nothing;
} else {
uintptr_t offset = static_cast<const uint8_t*>(aPtr) - aPagesStart;
uintptr_t allPageIndex = offset / kPageSize;
MOZ_ASSERT(allPageIndex < kNumAllPages); if (allPageIndex & 1) { // Odd-indexed pages are allocation pages.
uintptr_t allocPageIndex = allPageIndex / 2;
MOZ_ASSERT(allocPageIndex < kNumAllocPages);
mTag = Tag::AllocPage;
mIndex = allocPageIndex;
} else { // Even-numbered pages are guard pages.
mTag = Tag::GuardPage;
}
}
}
// This should only be called after IsNothing() and IsGuardPage() have been // checked and failed.
uintptr_t AllocPageIndex() const {
MOZ_RELEASE_ASSERT(mTag == Tag::AllocPage); return mIndex;
}
};
// On MacOS, the first __thread/thread_local access calls malloc, which leads // to an infinite loop. So we use pthread-based TLS instead, which somehow // doesn't have this problem. #if !defined(XP_DARWIN) # define PHC_THREAD_LOCAL(T) MOZ_THREAD_LOCAL(T) #else # define PHC_THREAD_LOCAL(T) \
detail::ThreadLocal<T, detail::ThreadLocalKeyStorage> #endif
// The virtual address space reserved by PHC. It is shared, immutable global // state. Initialized by phc_init() and never changed after that. phc_init() // runs early enough that no synchronization is needed. class PHCRegion { private: // The bounds of the allocated pages.
uint8_t* const mPagesStart;
uint8_t* const mPagesLimit;
// Allocates the allocation pages and the guard pages, contiguously.
uint8_t* AllocAllPages() { // The memory allocated here is never freed, because it would happen at // process termination when it would be of little use.
// We can rely on jemalloc's behaviour that when it allocates memory aligned // with its own chunk size it will over-allocate and guarantee that the // memory after the end of our allocation, but before the next chunk, is // decommitted and inaccessible. Elsewhere in PHC we assume that we own // that page (so that memory errors in it get caught by PHC) but here we // use kAllPagesJemallocSize which subtracts jemalloc's guard page. void* pages = MozJemalloc::memalign(kPhcAlign, kAllPagesJemallocSize); if (!pages) {
MOZ_CRASH();
}
// Make the pages inaccessible. #ifdef XP_WIN if (!VirtualFree(pages, kAllPagesJemallocSize, MEM_DECOMMIT)) {
MOZ_CRASH("VirtualFree failed");
} #else if (mmap(pages, kAllPagesJemallocSize, PROT_NONE,
MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0) == MAP_FAILED) {
MOZ_CRASH("mmap failed");
} #endif
returnstatic_cast<uint8_t*>(pages);
}
public:
PHCRegion();
class PtrKind PtrKind(constvoid* aPtr) { class PtrKind pk(aPtr, mPagesStart, mPagesLimit); return pk;
}
// Get the address of the allocation page referred to via an index. Used when // marking the page as accessible/inaccessible.
uint8_t* AllocPagePtr(uintptr_t aIndex) {
MOZ_ASSERT(aIndex < kNumAllocPages); // Multiply by two and add one to account for allocation pages *and* guard // pages. return mPagesStart + (2 * aIndex + 1) * kPageSize;
}
};
// This type is used as a proof-of-lock token, to make it clear which functions // require mMutex to be locked. using PHCLock = const MutexAutoLock&;
// Shared, mutable global state. Many fields are protected by sMutex; functions // that access those feilds should take a PHCLock as proof that mMutex is held. // Other fields are TLS or Atomic and don't need the lock. class PHC { enumclass AllocPageState {
NeverAllocated = 0,
InUse = 1,
Freed = 2,
};
// Metadata for each allocation page. class AllocPageInfo { public:
AllocPageInfo()
: mState(AllocPageState::NeverAllocated),
mBaseAddr(nullptr),
mReuseTime(0) {}
// The current allocation page state.
AllocPageState mState;
// The arena that the allocation is nominally from. This isn't meaningful // within PHC, which has no arenas. But it is necessary for reallocation of // page allocations as normal allocations, such as in this code: // // p = moz_arena_malloc(arenaId, 4096); // realloc(p, 8192); // // The realloc is more than one page, and thus too large for PHC to handle. // Therefore, if PHC handles the first allocation, it must ask mozjemalloc // to allocate the 8192 bytes in the correct arena, and to do that, it must // call MozJemalloc::moz_arena_malloc with the correct arenaId under the // covers. Therefore it must record that arenaId. // // This field is also needed for jemalloc_ptr_info() to work, because it // also returns the arena ID (but only in debug builds). // // - NeverAllocated: must be 0. // - InUse | Freed: can be any valid arena ID value.
Maybe<arena_id_t> mArenaId;
// The starting address of the allocation. Will not be the same as the page // address unless the allocation is a full page. // - NeverAllocated: must be 0. // - InUse | Freed: must be within the allocation page.
uint8_t* mBaseAddr;
// Usable size is computed as the number of bytes between the pointer and // the end of the allocation page. This might be bigger than the requested // size, especially if an outsized alignment is requested.
size_t UsableSize() const { return mState == AllocPageState::NeverAllocated
? 0
: kPageSize - (reinterpret_cast<uintptr_t>(mBaseAddr) &
(kPageSize - 1));
}
// The internal fragmentation for this allocation.
size_t FragmentationBytes() const {
MOZ_ASSERT(kPageSize >= UsableSize()); return mState == AllocPageState::InUse ? kPageSize - UsableSize() : 0;
}
// The time at which the page is available for reuse, as measured against // mNow. When the page is in use this value will be kMaxTime. // - NeverAllocated: must be 0. // - InUse: must be kMaxTime. // - Freed: must be > 0 and < kMaxTime.
Time mReuseTime;
// The next index for a free list of pages.`
Maybe<uintptr_t> mNextPage;
};
public: // The RNG seeds here are poor, but non-reentrant since this can be called // from malloc(). SetState() will reset the RNG later.
PHC() : mRNG(RandomSeed<1>(), RandomSeed<2>()) {
mMutex.Init(); if (!tlsIsDisabled.init()) {
MOZ_CRASH();
} if (!tlsAllocDelay.init()) {
MOZ_CRASH();
} if (!tlsLastDelay.init()) {
MOZ_CRASH();
}
// This constructor is part of PHC's very early initialisation, // see phc_init(), and if PHC is default-on it'll start marking allocations // and we must setup the delay. However once XPCOM starts it'll call // SetState() which will re-initialise the RNG and allocation delay.
MutexAutoLock lock(mMutex);
// Is the page free? And if so, has enough time passed that we can use it? bool IsPageAllocatable(PHCLock, uintptr_t aIndex, Time aNow) { const AllocPageInfo& page = mAllocPages[aIndex]; return page.mState != AllocPageState::InUse && aNow >= page.mReuseTime;
}
// Get the address of the allocation page referred to via an index. Used // when checking pointers against page boundaries.
uint8_t* AllocPageBaseAddr(PHCLock, uintptr_t aIndex) { return mAllocPages[aIndex].mBaseAddr;
}
// The total fragmentation in PHC
size_t FragmentationBytes() const {
size_t sum = 0; for (constauto& page : mAllocPages) {
sum += page.FragmentationBytes();
} return sum;
}
// page.mState is not changed. if (aArenaId.isSome()) { // Crash if the arenas don't match.
MOZ_RELEASE_ASSERT(page.mArenaId == aArenaId);
}
page.mBaseAddr = aNewBaseAddr; // We could just keep the original alloc stack, but the realloc stack is // more recent and therefore seems more useful.
page.mAllocStack = Some(aAllocStack); // page.mFreeStack is not changed. // page.mReuseTime is not changed. // page.mNextPage is not changed.
};
// page.mArenaId is left unchanged, for jemalloc_ptr_info() calls that // occur after freeing (e.g. in the PtrInfo test in TestJemalloc.cpp). if (aArenaId.isSome()) { // Crash if the arenas don't match.
MOZ_RELEASE_ASSERT(page.mArenaId == aArenaId);
}
// page.musableSize is left unchanged, for reporting on UAF, and for // jemalloc_ptr_info() calls that occur after freeing (e.g. in the PtrInfo // test in TestJemalloc.cpp).
// page.mAllocStack is left unchanged, for reporting on UAF.
page.mFreeStack = Some(aFreeStack);
Time now = Now(); #if PHC_LOGGING
mFreeTime[aIndex] = now; #endif
page.mReuseTime = now + aReuseDelay;
staticvoid CrashOnGuardPage(void* aPtr) { // An operation on a guard page? This is a bounds violation. Deliberately // touch the page in question to cause a crash that triggers the usual PHC // machinery.
LOG("CrashOnGuardPage(%p), bounds violation\n", aPtr);
*static_cast<uint8_t*>(aPtr) = 0;
MOZ_CRASH("unreachable");
}
// The pointer must point to the start of the allocation.
MOZ_RELEASE_ASSERT(page.mBaseAddr == aPtr);
if (page.mState == AllocPageState::Freed) {
LOG("EnsureValidAndInUse(%p), use-after-free\n", aPtr); // An operation on a freed page? This is a particular kind of // use-after-free. Deliberately touch the page in question, in order to // cause a crash that triggers the usual PHC machinery. But unlock mMutex // first, because that self-same PHC machinery needs to re-lock it, and // the crash causes non-local control flow so mMutex won't be unlocked // the normal way in the caller.
mMutex.Unlock();
*static_cast<uint8_t*>(aPtr) = 0;
MOZ_CRASH("unreachable");
}
}
// This expects GMUt::mMutex to be locked but can't check it with a parameter // since we try-lock it. void FillAddrInfo(uintptr_t aIndex, constvoid* aBaseAddr, bool isGuardPage,
phc::AddrInfo& aOut) { const AllocPageInfo& page = mAllocPages[aIndex]; if (isGuardPage) {
aOut.mKind = phc::AddrInfo::Kind::GuardPage;
} else { switch (page.mState) { case AllocPageState::NeverAllocated:
aOut.mKind = phc::AddrInfo::Kind::NeverAllocatedPage; break;
case AllocPageState::InUse:
aOut.mKind = phc::AddrInfo::Kind::InUsePage; break;
case AllocPageState::Freed:
aOut.mKind = phc::AddrInfo::Kind::FreedPage; break;
case AllocPageState::InUse: { // Only return TagLiveAlloc if the pointer is within the bounds of the // allocation's usable size.
uint8_t* base = page.mBaseAddr;
uint8_t* limit = base + page.UsableSize(); if (base <= aPtr && aPtr < limit) {
*aInfo = {TagLiveAlloc, page.mBaseAddr, page.UsableSize(),
page.mArenaId.valueOr(0)}; return;
} break;
}
case AllocPageState::Freed: { // Only return TagFreedAlloc if the pointer is within the bounds of the // former allocation's usable size.
uint8_t* base = page.mBaseAddr;
uint8_t* limit = base + page.UsableSize(); if (base <= aPtr && aPtr < limit) {
*aInfo = {TagFreedAlloc, page.mBaseAddr, page.UsableSize(),
page.mArenaId.valueOr(0)}; return;
} break;
}
default:
MOZ_CRASH();
}
// Pointers into guard pages will end up here, as will pointers into // allocation pages that aren't within the allocation's bounds.
*aInfo = {TagUnknown, nullptr, 0, 0};
}
// This is an integer because FdPrintf only supports integer printing.
size_t PageAllocHitRate(PHCLock) { return mPageAllocHits * 100 / (mPageAllocHits + mPageAllocMisses);
} #endif
// Should we make new PHC allocations? bool ShouldMakeNewAllocations() const { return mPhcState == mozilla::phc::Enabled;
}
using PHCState = mozilla::phc::PHCState; void SetState(PHCState aState) { if (mPhcState != PHCState::Enabled && aState == PHCState::Enabled) {
MutexAutoLock lock(mMutex); // Reset the RNG at this point with a better seed.
ResetRNG(lock);
// Decrements the delay and returns true if it's time to make a new PHC // allocation. staticbool DecrementDelay() { const Delay alloc_delay = tlsAllocDelay.get();
if (MOZ_LIKELY(alloc_delay > 0)) {
tlsAllocDelay.set(alloc_delay - 1); returnfalse;
} // The local delay has expired, check the shared delay. This path is also // executed on a new thread's first allocation, the result is the same: all // the thread's TLS fields will be initialised.
// This accesses sPHC but we want to ensure it's still a static member // function so that sPHC isn't dereferenced until after the hot path above.
MOZ_ASSERT(sPHC);
sPHC->AdvanceNow();
// Use an atomic fetch-and-subtract. This uses unsigned underflow semantics // to avoid doing a full compare-and-swap.
Delay new_delay = (sAllocDelay -= kDelayDecrementAmount);
Delay old_delay = new_delay + kDelayDecrementAmount; if (MOZ_LIKELY(new_delay < DELAY_MAX)) { // Normal case, we decremented the shared delay but it's not yet // underflowed.
tlsAllocDelay.set(kDelayDecrementAmount);
tlsLastDelay.set(kDelayDecrementAmount);
LOG("Update sAllocDelay <- %zu, tlsAllocDelay <- %zu\n",
size_t(new_delay), size_t(kDelayDecrementAmount)); returnfalse;
}
if (old_delay < new_delay) { // The shared delay only just underflowed, so unless we hit exactly zero // we should set our local counter and continue.
LOG("Update sAllocDelay <- %zu, tlsAllocDelay <- %zu\n",
size_t(new_delay), size_t(old_delay)); if (old_delay == 0) { // We don't need to set tlsAllocDelay because it's already zero, we know // because the condition at the beginning of this function failed. returntrue;
}
tlsAllocDelay.set(old_delay);
tlsLastDelay.set(old_delay); returnfalse;
}
// The delay underflowed on another thread or a previous failed allocation // by this thread. Return true and attempt the next allocation, if the // other thread wins we'll check for that before committing.
LOG("Update sAllocDelay <- %zu, tlsAllocDelay <- %zu\n", size_t(new_delay),
size_t(alloc_delay)); returntrue;
}
staticvoid ResetLocalAllocDelay(Delay aDelay = 0) { // We could take some delay from the shared delay but we'd need a // compare-and-swap because this is called on paths that don't make // allocations. Or we can set the local delay to zero and let it get // initialised on the next allocation.
tlsAllocDelay.set(aDelay);
tlsLastDelay.set(aDelay);
}
// Set a new allocation delay and return true if the delay was less than zero // (but it's unsigned so interpret it as signed) indicating that we won the // race to make the next allocation. staticbool SetNewAllocDelay(Delay aNewAllocDelay) { bool cas_retry; do { // We read the current delay on every iteration, we consider that the PHC // allocation is still "up for grabs" if sAllocDelay < 0. This is safe // even while other threads continuing to fetch-and-subtract sAllocDelay // in DecrementDelay(), up to DELAY_MAX (2^31) calls to DecrementDelay().
Delay read_delay = sAllocDelay; if (read_delay < DELAY_MAX) { // Another thread already set a valid delay.
LOG("Observe delay %zu this thread lost the race\n",
size_t(read_delay));
ResetLocalAllocDelay(); returnfalse;
} else {
LOG("Preparing for CAS, read sAllocDelay %zu\n", size_t(read_delay));
}
cas_retry = !sAllocDelay.compareExchange(read_delay, aNewAllocDelay); if (cas_retry) {
LOG("Lost the CAS, sAllocDelay is now %zu\n", size_t(sAllocDelay));
cpu_pause(); // We raced against another thread and lost.
}
} while (cas_retry);
LOG("Won the CAS, set sAllocDelay = %zu\n", size_t(sAllocDelay));
ResetLocalAllocDelay(); returntrue;
}
if (!mFreePageListTail) { // The list is empty this page will become the beginning and end.
MOZ_ASSERT(!mFreePageListHead);
mFreePageListHead = Some(aIndex);
} else {
MOZ_ASSERT(mFreePageListTail.value() < kNumAllocPages);
AllocPageInfo& tail_page = mAllocPages[mFreePageListTail.value()];
MOZ_ASSERT(!tail_page.mNextPage);
tail_page.mNextPage = Some(aIndex);
}
page.mNextPage = Nothing();
mFreePageListTail = Some(aIndex);
}
private: template <int N>
uint64_t RandomSeed() { // An older version of this code used RandomUint64() here, but on Mac that // function uses arc4random(), which can allocate, which would cause // re-entry, which would be bad. So we just use time(), a local variable // address and a global variable address. These are mediocre sources of // entropy, but good enough for PHC.
static_assert(N == 0 || N == 1 || N == 2, "must be 0, 1 or 2");
uint64_t seed; if (N == 0) {
time_t t = time(nullptr);
seed = t ^ (t << 32);
} elseif (N == 1) {
seed = uintptr_t(&seed) ^ (uintptr_t(&seed) << 32);
} else {
seed = uintptr_t(&sRegion) ^ (uintptr_t(&sRegion) << 32);
} return seed;
}
void AssertAllocPageInUse(PHCLock, const AllocPageInfo& aPage) {
MOZ_ASSERT(aPage.mState == AllocPageState::InUse); // There is nothing to assert about aPage.mArenaId.
MOZ_ASSERT(aPage.mBaseAddr);
MOZ_ASSERT(aPage.UsableSize() > 0);
MOZ_ASSERT(aPage.mAllocStack.isSome());
MOZ_ASSERT(aPage.mFreeStack.isNothing());
MOZ_ASSERT(aPage.mReuseTime == kMaxTime);
MOZ_ASSERT(!aPage.mNextPage);
}
void AssertAllocPageNotInUse(PHCLock, const AllocPageInfo& aPage) { // We can assert a lot about `NeverAllocated` pages, but not much about // `Freed` pages. #ifdef DEBUG bool isFresh = aPage.mState == AllocPageState::NeverAllocated;
MOZ_ASSERT(isFresh || aPage.mState == AllocPageState::Freed);
MOZ_ASSERT_IF(isFresh, aPage.mArenaId == Nothing());
MOZ_ASSERT(isFresh == (aPage.mBaseAddr == nullptr));
MOZ_ASSERT(isFresh == (aPage.mAllocStack.isNothing()));
MOZ_ASSERT(isFresh == (aPage.mFreeStack.isNothing()));
MOZ_ASSERT(aPage.mReuseTime != kMaxTime); #endif
}
// To improve locality we try to order this file by how frequently different // fields are modified and place all the modified-together fields early and // ideally within a single cache line. public: // The mutex that protects the other members.
alignas(kCacheLineSize) Mutex mMutex MOZ_UNANNOTATED;
private: // The current time. We use ReleaseAcquire semantics since we attempt to // update this by larger increments and don't want to lose an entire update.
Atomic<Time, ReleaseAcquire> mNow;
// This will only ever be updated from one thread. The other threads should // eventually get the update.
Atomic<PHCState, Relaxed> mPhcState =
Atomic<PHCState, Relaxed>(DEFAULT_STATE);
// RNG for deciding which allocations to treat specially. It doesn't need to // be high quality. // // This is a raw pointer for the reason explained in the comment above // PHC's constructor. Don't change it to UniquePtr or anything like that.
non_crypto::XorShift128PlusRNG mRNG;
// A linked list of free pages. Pages are allocated from the head of the list // and returned to the tail. The list will naturally order itself by "last // freed time" so if the head of the list can't satisfy an allocation due to // time then none of the pages can.
Maybe<uintptr_t> mFreePageListHead;
Maybe<uintptr_t> mFreePageListTail;
#if PHC_LOGGING // How many allocations that could have been page allocs actually were? As // constrained kNumAllocPages. If the hit ratio isn't close to 100% it's // likely that the global constants are poorly chosen.
size_t mPageAllocHits = 0;
size_t mPageAllocMisses = 0; #endif
// The remaining fields are updated much less often, place them on the next // cache line.
// The average delay before doing any page allocations at the start of a // process. Note that roughly 1 million allocations occur in the main process // while starting the browser. The delay range is 1..gAvgFirstAllocDelay*2.
alignas(kCacheLineSize) Delay mAvgFirstAllocDelay = 64 * 1024;
// The average delay until the next attempted page allocation, once we get // past the first delay. The delay range is 1..kAvgAllocDelay*2.
Delay mAvgAllocDelay = 16 * 1024;
// The average delay before reusing a freed page. Should be significantly // larger than kAvgAllocDelay, otherwise there's not much point in having it. // The delay range is (kAvgAllocDelay / 2)..(kAvgAllocDelay / 2 * 3). This is // different to the other delay ranges in not having a minimum of 1, because // that's such a short delay that there is a high likelihood of bad stacks in // any crash report.
Delay mAvgPageReuseDelay = 256 * 1024;
// When true, PHC does as little as possible. // // (a) It does not allocate any new page allocations. // // (b) It avoids doing any operations that might call malloc/free/etc., which // would cause re-entry into PHC. (In practice, MozStackWalk() is the // only such operation.) Note that calls to the functions in MozJemalloc // are ok. // // For example, replace_malloc() will just fall back to mozjemalloc. However, // operations involving existing allocations are more complex, because those // existing allocations may be page allocations. For example, if // replace_free() is passed a page allocation on a PHC-disabled thread, it // will free the page allocation in the usual way, but it will get a dummy // freeStack in order to avoid calling MozStackWalk(), as per (b) above. // // This single disabling mechanism has two distinct uses. // // - It's used to prevent re-entry into PHC, which can cause correctness // problems. For example, consider this sequence. // // 1. enter replace_free() // 2. which calls PageFree() // 3. which calls MozStackWalk() // 4. which locks a mutex M, and then calls malloc // 5. enter replace_malloc() // 6. which calls MaybePageAlloc() // 7. which calls MozStackWalk() // 8. which (re)locks a mutex M --> deadlock // // We avoid this sequence by "disabling" the thread in PageFree() (at step // 2), which causes MaybePageAlloc() to fail, avoiding the call to // MozStackWalk() (at step 7). // // In practice, realloc or free of a PHC allocation is unlikely on a thread // that is disabled because of this use: MozStackWalk() will probably only // realloc/free allocations that it allocated itself, but those won't be // page allocations because PHC is disabled before calling MozStackWalk(). // // (Note that MaybePageAlloc() could safely do a page allocation so long as // it avoided calling MozStackWalk() by getting a dummy allocStack. But it // wouldn't be useful, and it would prevent the second use below.) // // - It's used to prevent PHC allocations in some tests that rely on // mozjemalloc's exact allocation behaviour, which PHC does not replicate // exactly. (Note that (b) isn't necessary for this use -- MozStackWalk() // could be safely called -- but it is necessary for the first use above.) // static PHC_THREAD_LOCAL(bool) tlsIsDisabled;
// Delay until the next attempt at a page allocation. The delay is made up of // two parts the global delay and each thread's local portion of that delay: // // delay = sDelay + sum_all_threads(tlsAllocDelay) // // Threads use their local delay to reduce contention on the shared delay. // // See the comment in MaybePageAlloc() for an explanation of why it uses // ReleaseAcquire semantics. static Atomic<Delay, ReleaseAcquire> sAllocDelay; static PHC_THREAD_LOCAL(Delay) tlsAllocDelay;
// The last value we set tlsAllocDelay to before starting to count down. static PHC_THREAD_LOCAL(Delay) tlsLastDelay;
AllocPageInfo mAllocPages[kNumAllocPages]; #if PHC_LOGGING
Time mFreeTime[kNumAllocPages]; #endif
// Both of these are accessed early on hot code paths. We make them both // static variables rathan making sRegion a member of sPHC to keep these hot // code paths as fast as possible. They're both "write once" so they can // share a cache line. static PHCRegion* sRegion; static PHC* sPHC;
};
// These globals are read together and hardly ever written. They should be on // the same cache line. They should be in a different cache line to data that // is manipulated often (sMutex and mNow are members of sPHC for that reason) so // that this cache line can be shared amoung cores. This makes a measurable // impact to calls to maybe_init()
alignas(kCacheLineSize) PHCRegion* PHC::sRegion;
PHC* PHC::sPHC;
// This must be defined after the PHC class.
PHCRegion::PHCRegion()
: mPagesStart(AllocAllPages()), mPagesLimit(mPagesStart + kAllPagesSize) {
LOG("AllocAllPages at %p..%p\n", mPagesStart, mPagesLimit);
}
// When PHC wants to crash we first have to unlock so that the crash reporter // can call into PHC to lockup its pointer. That also means that before calling // PHCCrash please ensure that state is consistent. Because this can report an // arbitrary string, use of it must be reviewed by Firefox data stewards. staticvoid PHCCrash(PHCLock, constchar* aMessage)
MOZ_REQUIRES(PHC::sPHC->mMutex) {
PHC::sPHC->mMutex.Unlock();
MOZ_CRASH_UNSAFE(aMessage);
}
class AutoDisableOnCurrentThread { public:
AutoDisableOnCurrentThread(const AutoDisableOnCurrentThread&) = delete;
// WARNING: this function runs *very* early -- before all static initializers // have run. For this reason, non-scalar globals (sRegion, sPHC) are allocated // dynamically (so we can guarantee their construction in this function) rather // than statically. staticbool phc_init() { if (GetKernelPageSize() != kPageSize) { returnfalse;
}
// sRegion and sPHC are never freed. They live for the life of the process.
PHC::sRegion = InfallibleAllocPolicy::new_<PHCRegion>();
PHC::sPHC = InfallibleAllocPolicy::new_<PHC>();
#ifndef XP_WIN // Avoid deadlocks when forking by acquiring our state lock prior to forking // and releasing it after forking. See |LogAlloc|'s |phc_init| for // in-depth details.
pthread_atfork(PHC::prefork, PHC::postfork_parent, PHC::postfork_child); #endif
returntrue;
}
staticinlinebool maybe_init() { // This runs on hot paths and we can save some memory accesses by using sPHC // to test if we've already initialised PHC successfully. if (MOZ_UNLIKELY(!PHC::sPHC)) { // The lambda will only be called once and is thread safe. staticbool sInitSuccess = []() { return phc_init(); }(); return sInitSuccess;
}
// This is the hot-path for testing if we should make a PHC allocation, it // should be inlined into the caller while the remainder of the tests that are // in MaybePageAlloc need not be inlined. static MOZ_ALWAYS_INLINE bool ShouldPageAllocHot(size_t aReqSize) { if (MOZ_UNLIKELY(!maybe_init())) { returnfalse;
}
if (MOZ_UNLIKELY(aReqSize > kPageSize)) { returnfalse;
}
// Decrement the delay. If it's zero, we do a page allocation and reset the // delay to a random number. if (MOZ_LIKELY(!PHC::DecrementDelay())) { returnfalse;
}
// Attempt a page allocation if the time and the size are right. Allocated // memory is zeroed if aZero is true. On failure, the caller should attempt a // normal allocation via MozJemalloc. Can be called in a context where // PHC::mMutex is locked. staticvoid* MaybePageAlloc(const Maybe<arena_id_t>& aArenaId, size_t aReqSize,
size_t aAlignment, bool aZero) {
MOZ_ASSERT(IsPowerOfTwo(aAlignment));
MOZ_ASSERT(PHC::sPHC); if (!PHC::sPHC->ShouldMakeNewAllocations()) { // Reset the allocation delay so that we take the fast path most of the // time. Rather than take the lock and use the RNG which are unnecessary // when PHC is disabled, instead set the delay to a reasonably high number, // the default average first allocation delay. This is reset when PHC is // re-enabled anyway.
PHC::ForceSetNewAllocDelay(kDelayResetWhenDisabled); return nullptr;
}
if (PHC::IsDisabledOnCurrentThread()) { // We don't reset sAllocDelay since that might affect other threads. We // assume this is okay because either this thread will be re-enabled after // less than DELAY_MAX allocations or that there are other active threads // that will reset sAllocDelay. We do reset our local delay which will // cause this thread to "back off" from updating sAllocDelay on future // allocations.
PHC::ResetLocalAllocDelay(kDelayBackoffAmount); return nullptr;
}
// Disable on this thread *before* getting the stack trace.
AutoDisableOnCurrentThread disable;
// Get the stack trace *before* locking the mutex. If we return nullptr then // it was a waste, but it's not so frequent, and doing a stack walk while // the mutex is locked is problematic (see the big comment on // StackTrace::Fill() for details).
StackTrace allocStack;
allocStack.Fill();
// Pages are allocated from a free list populated in order of when they're // freed. If the page at the head of the list is too recently freed to be // reused then no other pages on the list will be either.
Maybe<uintptr_t> mb_index = PHC::sPHC->PopNextFreeIfAllocatable(lock, now); if (!mb_index) {
PHC::sPHC->IncPageAllocMisses(lock);
LogNoAlloc(aReqSize, aAlignment, newAllocDelay); return nullptr;
}
uintptr_t index = mb_index.value();
// Put the allocation as close to the end of the page as possible, // allowing for alignment requirements.
uint8_t* ptr = pagePtr + kPageSize - usableSize; if (aAlignment != 1) {
ptr = reinterpret_cast<uint8_t*>(
(reinterpret_cast<uintptr_t>(ptr) & ~(aAlignment - 1)));
}
#if PHC_LOGGING
Time then = PHC::sPHC->GetFreeTime(i);
lifetime = then != 0 ? now - then : 0; #endif
// This function handles both realloc and moz_arena_realloc. // // As always, realloc is complicated, and doubly so when there are two // different kinds of allocations in play. Here are the possible transitions, // and what we do in practice. // // - normal-to-normal: This is straightforward and obviously necessary. // // - normal-to-page: This is disallowed because it would require getting the // arenaId of the normal allocation, which isn't possible in non-DEBUG builds // for security reasons. // // - page-to-page: This is done whenever possible, i.e. whenever the new size // is less than or equal to 4 KiB. This choice counterbalances the // disallowing of normal-to-page allocations, in order to avoid biasing // towards or away from page allocations. It always occurs in-place. // // - page-to-normal: this is done only when necessary, i.e. only when the new // size is greater than 4 KiB. This choice naturally flows from the // prior choice on page-to-page transitions. // // In summary: realloc doesn't change the allocation kind unless it must. // // This function may return: // - Some(pointer) when PHC handled the reallocation. // - Some(nullptr) when PHC should have handled a page-to-normal transition // but couldn't because of OOM. // - Nothing() when PHC is disabled or the original allocation was not // under PHC.
MOZ_ALWAYS_INLINE static Maybe<void*> MaybePageRealloc( const Maybe<arena_id_t>& aArenaId, void* aOldPtr, size_t aNewSize) { if (!aOldPtr) { // Null pointer. Treat like malloc(aNewSize). return Some(PageMalloc(aArenaId, aNewSize));
}
if (!maybe_init()) { return Nothing();
}
PtrKind pk = PHC::sRegion->PtrKind(aOldPtr); if (pk.IsNothing()) { // A normal-to-normal transition. return Nothing();
}
if (pk.IsGuardPage()) {
PHC::CrashOnGuardPage(aOldPtr);
}
// At this point we know we have an allocation page.
uintptr_t index = pk.AllocPageIndex();
// A page-to-something transition.
PHC::sPHC->AdvanceNow(PHC::LocalAllocDelay());
// Note that `disable` has no effect unless it is emplaced below.
Maybe<AutoDisableOnCurrentThread> disable; // Get the stack trace *before* locking the mutex.
StackTrace stack; if (PHC::IsDisabledOnCurrentThread()) { // PHC is disabled on this thread. Leave the stack empty.
} else { // Disable on this thread *before* getting the stack trace.
disable.emplace();
stack.Fill();
}
MutexAutoLock lock(PHC::sPHC->mMutex);
// Check for realloc() of a freed block.
PHC::sPHC->EnsureValidAndInUse(lock, aOldPtr, index);
if (aNewSize <= kPageSize && PHC::sPHC->ShouldMakeNewAllocations()) { // A page-to-page transition. Just keep using the page allocation. We do // this even if the thread is disabled, because it doesn't create a new // page allocation. Note that ResizePageInUse() checks aArenaId. // // Move the bytes with memmove(), because the old allocation and the new // allocation overlap. Move the usable size rather than the requested size, // because the user might have used malloc_usable_size() and filled up the // usable size.
size_t oldUsableSize = PHC::sPHC->PageUsableSize(lock, index);
size_t newUsableSize = MozJemalloc::malloc_good_size(aNewSize);
uint8_t* pagePtr = PHC::sRegion->AllocPagePtr(index);
uint8_t* newPtr = pagePtr + kPageSize - newUsableSize;
memmove(newPtr, aOldPtr, std::min(oldUsableSize, aNewSize));
PHC::sPHC->ResizePageInUse(lock, index, aArenaId, newPtr, stack);
LOG("PageRealloc-Reuse(%p, %zu) -> %p\n", aOldPtr, aNewSize, newPtr); return Some(newPtr);
}
// A page-to-normal transition (with the new size greater than page-sized). // (Note that aArenaId is checked below.) void* newPtr; if (aArenaId.isSome()) {
newPtr = MozJemalloc::moz_arena_malloc(*aArenaId, aNewSize);
} else {
Maybe<arena_id_t> oldArenaId = PHC::sPHC->PageArena(lock, index);
newPtr = (oldArenaId.isSome()
? MozJemalloc::moz_arena_malloc(*oldArenaId, aNewSize)
: MozJemalloc::malloc(aNewSize));
} if (!newPtr) { return Some(nullptr);
}
Delay reuseDelay = ReuseDelay(lock);
// Copy the usable size rather than the requested size, because the user // might have used malloc_usable_size() and filled up the usable size. Note // that FreePage() checks aArenaId (via SetPageFreed()).
size_t oldUsableSize = PHC::sPHC->PageUsableSize(lock, index);
memcpy(newPtr, aOldPtr, std::min(oldUsableSize, aNewSize));
FreePage(lock, index, aArenaId, stack, reuseDelay);
LOG("PageRealloc-Free(%p[%zu], %zu) -> %p, %zu delay, reuse at ~%zu\n",
aOldPtr, index, aNewSize, newPtr, size_t(reuseDelay),
size_t(PHC::Now()) + reuseDelay);
// This handles both free and moz_arena_free. staticvoid DoPageFree(const Maybe<arena_id_t>& aArenaId, void* aPtr) {
PtrKind pk = PHC::sRegion->PtrKind(aPtr); if (pk.IsGuardPage()) {
PHC::CrashOnGuardPage(aPtr);
}
// At this point we know we have an allocation page.
PHC::sPHC->AdvanceNow(PHC::LocalAllocDelay());
uintptr_t index = pk.AllocPageIndex();
// Note that `disable` has no effect unless it is emplaced below.
--> --------------------
--> maximum size reached
--> --------------------
¤ Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.0.62Bemerkung:
(vorverarbeitet)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung ist noch experimentell.